Instructions to use MiniMaxAI/MiniMax-M1-80k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MiniMaxAI/MiniMax-M1-80k with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MiniMaxAI/MiniMax-M1-80k", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M1-80k", trust_remote_code=True, dtype="auto")

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MiniMaxAI/MiniMax-M1-80k with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MiniMaxAI/MiniMax-M1-80k"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M1-80k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MiniMaxAI/MiniMax-M1-80k

SGLang

How to use MiniMaxAI/MiniMax-M1-80k with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MiniMaxAI/MiniMax-M1-80k" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M1-80k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MiniMaxAI/MiniMax-M1-80k" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M1-80k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MiniMaxAI/MiniMax-M1-80k with Docker Model Runner:
```
docker model run hf.co/MiniMaxAI/MiniMax-M1-80k
```

QscQ commited on Jul 6, 2025

Commit

279327a

verified ·

1 Parent(s): fa9d550

Update docs/vllm_deployment_guide.md

Browse files

Files changed (1) hide show

docs/vllm_deployment_guide.md +12 -22

docs/vllm_deployment_guide.md CHANGED Viewed

@@ -41,20 +41,19 @@ git clone https://huggingface.co/MiniMaxAI/MiniMax-M1-80k
 ## 🛠️ Deployment Options
-### Option 1: Deploy Using Docker (Recommended)
 To ensure consistency and stability of the deployment environment, we recommend using Docker for deployment.
 ⚠️ **Version Requirements**:
-- MiniMax-M1 model requires vLLM version 0.8.3 or later for full support
-- If you are using a Docker image with vLLM version lower than the required version, you will need to:
-  1. Update to the latest vLLM code
-  2. Recompile vLLM from source. Follow the compilation instructions in Solution 2 of the Common Issues section
-- Special Note: For vLLM versions between 0.8.3 and 0.9.2, you need to modify the model configuration:
-  1. Open `config.json`
-  2. Change `config['architectures'] = ["MiniMaxM1ForCausalLM"]` to `config['architectures'] = ["MiniMaxText01ForCausalLM"]`
 1. Get the container image:
 ```bash
 docker pull vllm/vllm-openai:v0.8.3
 ```
@@ -77,21 +76,12 @@ sudo docker run -it \
     --name $NAME \
     $DOCKER_RUN_CMD \
     $IMAGE /bin/bash
-```
-### Option 2: Direct Installation of vLLM
-If your environment meets the following requirements:
-- CUDA 12.1
-- PyTorch 2.1
-You can directly install vLLM
-Installation command:
-```bash
-pip install vllm
 ```
 💡 If you are using other environment configurations, please refer to the [vLLM Installation Guide](https://docs.vllm.ai/en/latest/getting_started/installation.html)

 ## 🛠️ Deployment Options
+### Option: Deploy Using Docker (Recommended)
 To ensure consistency and stability of the deployment environment, we recommend using Docker for deployment.
 ⚠️ **Version Requirements**:
+- MiniMax-M1 model requires vLLM version 0.9.2 or later for full support
+- Special Note: Using vLLM versions below 0.9.2 may result in incompatibility or incorrect precision for the model:
+  - For details, see: [Fix minimax model cache & lm_head precision #19592](https://github.com/vllm-project/vllm/pull/19592)
 1. Get the container image:
+Currently, the official vLLM Docker image for version v0.9.2 has not been released yet.
+As an example, we will demonstrate how to manually build vLLM using version v0.8.3.
 ```bash
 docker pull vllm/vllm-openai:v0.8.3
 ```
     --name $NAME \
     $DOCKER_RUN_CMD \
     $IMAGE /bin/bash
+# install vLLM
+cd $CODE_DIR
+git clone https://github.com/vllm-project/vllm.git
+cd vllm
+pip install -e .
 ```
 💡 If you are using other environment configurations, please refer to the [vLLM Installation Guide](https://docs.vllm.ai/en/latest/getting_started/installation.html)