Instructions to use a2s-ai/MiniMax-M2-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use a2s-ai/MiniMax-M2-AWQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="a2s-ai/MiniMax-M2-AWQ")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("a2s-ai/MiniMax-M2-AWQ", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use a2s-ai/MiniMax-M2-AWQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "a2s-ai/MiniMax-M2-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "a2s-ai/MiniMax-M2-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/a2s-ai/MiniMax-M2-AWQ

SGLang

How to use a2s-ai/MiniMax-M2-AWQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "a2s-ai/MiniMax-M2-AWQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "a2s-ai/MiniMax-M2-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "a2s-ai/MiniMax-M2-AWQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "a2s-ai/MiniMax-M2-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use a2s-ai/MiniMax-M2-AWQ with Docker Model Runner:
```
docker model run hf.co/a2s-ai/MiniMax-M2-AWQ
```

HuggingFace Container User commited on Nov 5, 2025

Commit

3c4bc8b

1 Parent(s): b36db0c

fork: 05.11.2025

Browse files

Files changed (16) hide show

.gitattributes +2 -0
README.md +436 -3
chat_template.jinja +159 -0
config.json +3 -0
configuration.json +3 -0
docs/function_call_guide.md +482 -0
docs/function_call_guide_cn.md +482 -0
docs/vllm_deploy_guide.md +88 -0
docs/vllm_deploy_guide_cn.md +85 -0
figures/Bench.png +3 -0
generation_config.json +3 -0
merges.txt +0 -0
model.safetensors.index.json +3 -0
tokenizer.json +3 -0
tokenizer_config.json +3 -0
vocab.json +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.json filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,436 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- vLLM
+- AWQ
+base_model:
+  - MiniMaxAI/MiniMax-M2
+base_model_relation: quantized
+---
+# MiniMax-M2-AWQ
+Base model: [MiniMaxAI/MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2)
+### 【Dependencies / Installation】
+As of **2025-10-28**, create a fresh Python environment and run:
+```bash
+pip install -U pip
+pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
+```
+[vLLM Official Guide](https://docs.vllm.ai/projects/recipes/en/latest/MiniMax/MiniMax-M2.html)
+<details>
+<summary>testing environment</summary>
+```
+Package                            Version
+---------------------------------- --------------------------------
+aiohappyeyeballs                   2.6.1
+aiohttp                            3.13.1
+aiosignal                          1.4.0
+annotated-doc                      0.0.3
+annotated-types                    0.7.0
+anthropic                          0.71.0
+anyio                              4.11.0
+apache-tvm-ffi                     0.1.0b15
+astor                              0.8.1
+attrs                              25.4.0
+blake3                             1.0.8
+cachetools                         6.2.1
+cbor2                              5.7.1
+certifi                            2025.10.5
+cffi                               2.0.0
+charset-normalizer                 3.4.4
+click                              8.2.1
+cloudpickle                        3.1.1
+compressed-tensors                 0.12.2
+cuda-bindings                      13.0.3
+cuda-pathfinder                    1.3.1
+cuda-python                        13.0.3
+cupy-cuda12x                       13.6.0
+depyf                              0.20.0
+dill                               0.4.0
+diskcache                          5.6.3
+distro                             1.9.0
+dnspython                          2.8.0
+docstring_parser                   0.17.0
+einops                             0.8.1
+email-validator                    2.3.0
+fastapi                            0.120.1
+fastapi-cli                        0.0.14
+fastapi-cloud-cli                  0.3.1
+fastrlock                          0.8.3
+filelock                           3.20.0
+flashinfer-python                  0.4.1
+frozenlist                         1.8.0
+fsspec                             2025.9.0
+gguf                               0.17.1
+h11                                0.16.0
+hf-xet                             1.2.0
+httpcore                           1.0.9
+httptools                          0.7.1
+httpx                              0.28.1
+huggingface-hub                    0.36.0
+idna                               3.11
+importlib_metadata                 8.7.0
+iniconfig                          2.3.0
+interegular                        0.3.3
+Jinja2                             3.1.6
+jiter                              0.11.1
+jsonschema                         4.25.1
+jsonschema-specifications          2025.9.1
+lark                               1.2.2
+llguidance                         0.7.30
+llvmlite                           0.44.0
+lm-format-enforcer                 0.11.3
+loguru                             0.7.3
+markdown-it-py                     4.0.0
+MarkupSafe                         3.0.3
+mdurl                              0.1.2
+mistral_common                     1.8.5
+mpmath                             1.3.0
+msgpack                            1.1.2
+msgspec                            0.19.0
+multidict                          6.7.0
+networkx                           3.5
+ninja                              1.13.0
+numba                              0.61.2
+numpy                              2.2.6
+nvidia-cublas-cu12                 12.8.4.1
+nvidia-cuda-cupti-cu12             12.8.90
+nvidia-cuda-nvrtc-cu12             12.8.93
+nvidia-cuda-runtime-cu12           12.8.90
+nvidia-cudnn-cu12                  9.10.2.21
+nvidia-cudnn-frontend              1.15.0
+nvidia-cufft-cu12                  11.3.3.83
+nvidia-cufile-cu12                 1.13.1.3
+nvidia-curand-cu12                 10.3.9.90
+nvidia-cusolver-cu12               11.7.3.90
+nvidia-cusparse-cu12               12.5.8.93
+nvidia-cusparselt-cu12             0.7.1
+nvidia-cutlass-dsl                 4.3.0.dev0
+nvidia-ml-py                       13.580.82
+nvidia-nccl-cu12                   2.27.5
+nvidia-nvjitlink-cu12              12.8.93
+nvidia-nvshmem-cu12                3.3.20
+nvidia-nvtx-cu12                   12.8.90
+openai                             2.6.1
+openai-harmony                     0.0.4
+opencv-python-headless             4.12.0.88
+opentelemetry-api                  1.38.0
+opentelemetry-sdk                  1.38.0
+opentelemetry-semantic-conventions 0.59b0
+outlines_core                      0.2.11
+packaging                          25.0
+partial-json-parser                0.2.1.1.post6
+pillow                             12.0.0
+pip                                25.3
+pluggy                             1.6.0
+prometheus_client                  0.23.1
+prometheus-fastapi-instrumentator  7.1.0
+propcache                          0.4.1
+protobuf                           6.33.0
+psutil                             7.1.2
+py-cpuinfo                         9.0.0
+pybase64                           1.4.2
+pycountry                          24.6.1
+pycparser                          2.23
+pydantic                           2.12.3
+pydantic_core                      2.41.4
+pydantic-extra-types               2.10.6
+Pygments                           2.19.2
+pytest                             8.4.2
+python-dotenv                      1.2.1
+python-json-logger                 4.0.0
+python-multipart                   0.0.20
+PyYAML                             6.0.3
+pyzmq                              27.1.0
+ray                                2.50.1
+referencing                        0.37.0
+regex                              2025.10.23
+requests                           2.32.5
+rich                               14.2.0
+rich-toolkit                       0.15.1
+rignore                            0.7.1
+rpds-py                            0.28.0
+safetensors                        0.6.2
+scipy                              1.16.2
+sentencepiece                      0.2.1
+sentry-sdk                         3.0.0a7
+setproctitle                       1.3.7
+setuptools                         79.0.1
+shellingham                        1.5.4
+six                                1.17.0
+sniffio                            1.3.1
+soundfile                          0.13.1
+soxr                               1.0.0
+starlette                          0.48.0
+sympy                              1.14.0
+tabulate                           0.9.0
+tiktoken                           0.12.0
+tokenizers                         0.22.1
+torch                              2.9.0
+torchaudio                         2.9.0
+torchvision                        0.24.0
+tqdm                               4.67.1
+transformers                       4.57.1
+triton                             3.5.0
+triton_kernels                     1.0.0
+typer                              0.20.0
+typing_extensions                  4.15.0
+typing-inspection                  0.4.2
+urllib3                            2.5.0
+uv                                 0.9.5
+uvicorn                            0.38.0
+uvloop                             0.22.1
+vllm                               0.11.1rc4.dev38+g69f064062.cu129
+watchfiles                         1.1.1
+websockets                         15.0.1
+xgrammar                           0.1.25
+yarl                               1.22.0
+zipp                               3.23.0
+```
+</details>
+### 【vLLM Startup Command】
+<i>Note: When launching with TP=8, include `--enable-expert-parallel`;
+otherwise the expert tensors wouldn’t be evenly sharded across GPU devices.</i>
+```
+CONTEXT_LENGTH=32768
+vllm serve \
+    tclf90/MiniMax-M2-AWQ \
+    --served-model-name MY_MODEL \
+    --enable-auto-tool-choice \
+    --tool-call-parser minimax_m2 \
+    --reasoning-parser minimax_m2_append_think \
+    --swap-space 16 \
+    --max-num-seqs 32 \
+    --max-model-len $CONTEXT_LENGTH \
+    --gpu-memory-utilization 0.9 \
+    --tensor-parallel-size 8 \
+    --enable-expert-parallel \
+    --trust-remote-code \
+    --disable-log-requests \
+    --host 0.0.0.0 \
+    --port 8000
+```
+### 【Logs】
+```
+2025-11-03
+1.Due to my oversight, I have now completed the missing files and fully validated them on a 2xA100 setup.
+Upload model-00018-of-00041.safetensors
+Upload model-00019-of-00041.safetensors
+Upload model-00021-of-00041.safetensors
+Upload model-00023-of-00041.safetensors
+Upload model-00025-of-00041.safetensors
+Upload model-00027-of-00041.safetensors
+Upload model-00030-of-00041.safetensors
+Upload model-00035-of-00041.safetensors
+2025-10-28
+1. Initial commit
+```
+### 【Model Files】
+| File Size | Last Updated |
+|-----------|--------------|
+| `113GiB`  | `2025-10-28` |
+### 【Model Download】
+```python
+from huggingface_hub import snapshot_download
+snapshot_download('QuantTrio/MiniMax-M2-AWQ', cache_dir="your_local_path")
+```
+### 【Overview】
+<svg width="60%" height="auto" viewBox="0 0 144 48" fill="none" xmlns="http://www.w3.org/2000/svg">
+<path d="M26.6782 7.96523C26.6782 7.02436 25.913 6.26087 24.9739 6.26087C24.0348 6.26087 23.2695 7.0261 23.2695 7.96523V36.2139C23.2695 38.4 21.4904 40.1791 19.3043 40.1791C17.1183 40.1791 15.3391 38.4 15.3391 36.2139V18.0904C15.3391 17.1496 14.5739 16.3861 13.6348 16.3861C12.6956 16.3861 11.9304 17.1513 11.9304 18.0904V25.7722C11.9304 27.9583 10.1513 29.7374 7.96518 29.7374C5.7791 29.7374 4 27.9583 4 25.7722V22.9878C4 22.3635 4.50609 21.8574 5.13043 21.8574C5.75478 21.8574 6.26087 22.3635 6.26087 22.9878V25.7722C6.26087 26.713 7.02605 27.4765 7.96518 27.4765C8.90431 27.4765 9.66954 26.7113 9.66954 25.7722V18.0904C9.66954 15.9044 11.4487 14.1252 13.6348 14.1252C15.8209 14.1252 17.6 15.9044 17.6 18.0904V36.2139C17.6 37.1548 18.3652 37.9183 19.3043 37.9183C20.2435 37.9183 21.0087 37.153 21.0087 36.2139V25.1322V7.96523C21.0087 5.77914 22.7878 4 24.9739 4C27.16 4 28.9391 5.77914 28.9391 7.96523V31.3565C28.9391 31.9809 28.433 32.487 27.8087 32.487C27.1843 32.487 26.6782 31.9809 26.6782 31.3565V7.96523ZM47.6539 14.1252C45.4678 14.1252 43.6887 15.9044 43.6887 18.0904V33.2296C43.6887 34.1704 42.9235 34.9339 41.9843 34.9339C41.0452 34.9339 40.28 34.1687 40.28 33.2296V7.96523C40.28 5.77914 38.5008 4 36.3148 4C34.1287 4 32.3496 5.77914 32.3496 7.96523V40.0348C32.3496 40.9756 31.5843 41.7391 30.6452 41.7391C29.7061 41.7391 28.9409 40.9739 28.9409 40.0348V36.0643C28.9409 35.44 28.4348 34.9339 27.8104 34.9339C27.1861 34.9339 26.68 35.44 26.68 36.0643V40.0348C26.68 42.2209 28.4591 44 30.6452 44C32.8313 44 34.6104 42.2209 34.6104 40.0348V7.96523C34.6104 7.02436 35.3756 6.26087 36.3148 6.26087C37.2539 6.26087 38.0191 7.0261 38.0191 7.96523V33.2296C38.0191 35.4156 39.7982 37.1948 41.9843 37.1948C44.1704 37.1948 45.9496 35.4156 45.9496 33.2296V18.0904C45.9496 17.1496 46.7148 16.3861 47.6539 16.3861C48.593 16.3861 49.3582 17.1513 49.3582 18.0904V31.3565C49.3582 31.9809 49.8643 32.487 50.4887 32.487C51.113 32.487 51.6191 31.9809 51.6191 31.3565V18.0904C51.6191 15.9044 49.84 14.1252 47.6539 14.1252Z" fill="url(#paint0_linear_17_483)"/>
+<path d="M68.7671 16.5615H71.2541C71.3254 16.5615 71.3845 16.5859 71.435 16.6363C71.4836 16.6868 71.5097 16.7459 71.5097 16.8172V31.1824C71.5097 31.2537 71.4854 31.3128 71.435 31.3633C71.3845 31.4137 71.3254 31.4381 71.2541 31.4381H68.7671C68.6958 31.4381 68.6367 31.4137 68.5862 31.3633C68.5358 31.3146 68.5115 31.2537 68.5115 31.1824V21.812C68.5115 21.7563 68.4976 21.7268 68.4697 21.7268C68.4419 21.7268 68.4123 21.7476 68.3845 21.7911L66.1323 25.318C66.061 25.4311 65.9619 25.4885 65.8349 25.4885H64.581C64.4541 25.4885 64.3549 25.4328 64.2836 25.318L62.0315 21.7911C62.0036 21.7494 61.9741 21.7302 61.9462 21.7372C61.9184 21.7441 61.9045 21.7772 61.9045 21.8328V31.1824C61.9045 31.2537 61.8802 31.3128 61.8297 31.3633C61.7793 31.4137 61.7202 31.4381 61.6489 31.4381H59.1619C59.0906 31.4381 59.0315 31.4137 58.981 31.3633C58.9306 31.3146 58.9062 31.2537 58.9062 31.1824V16.8172C58.9062 16.7459 58.9306 16.6868 58.981 16.6363C59.0315 16.5859 59.0906 16.5615 59.1619 16.5615H61.6489C61.7758 16.5615 61.8749 16.6189 61.9462 16.732L65.1341 21.6833C65.1758 21.7685 65.2193 21.7685 65.261 21.6833L68.4697 16.732C68.541 16.6189 68.6402 16.5615 68.7671 16.5615Z" fill="currentColor"/>
+<path d="M74.1764 31.3633C74.1259 31.3146 74.1016 31.2537 74.1016 31.1824V16.8172C74.1016 16.7459 74.1259 16.6868 74.1764 16.6363C74.2268 16.5859 74.2859 16.5615 74.3572 16.5615H76.8442C76.9155 16.5615 76.9746 16.5859 77.0251 16.6363C77.0737 16.6868 77.0998 16.7459 77.0998 16.8172V31.1824C77.0998 31.2537 77.0755 31.3128 77.0251 31.3633C76.9746 31.4137 76.9155 31.4381 76.8442 31.4381H74.3572C74.2859 31.4381 74.2268 31.4137 74.1764 31.3633Z" fill="currentColor"/>
+<path d="M88.3066 16.6361C88.3553 16.5874 88.4162 16.5613 88.4875 16.5613H90.9744C91.0457 16.5613 91.1049 16.5857 91.1553 16.6361C91.204 16.6865 91.2301 16.7457 91.2301 16.817V31.1822C91.2301 31.2535 91.2057 31.3126 91.1553 31.363C91.1049 31.4135 91.0457 31.4378 90.9744 31.4378H88.5727C88.4301 31.4378 88.331 31.3822 88.2753 31.2674L82.771 22.1717C82.7431 22.13 82.7136 22.1109 82.6858 22.1178C82.6579 22.1248 82.644 22.1578 82.644 22.2135L82.6858 31.1805C82.6858 31.2518 82.6614 31.3109 82.611 31.3613C82.5606 31.4117 82.5014 31.4361 82.4301 31.4361H79.9431C79.8718 31.4361 79.8127 31.4117 79.7623 31.3613C79.7118 31.3126 79.6875 31.2518 79.6875 31.1805V16.8152C79.6875 16.7439 79.7118 16.6848 79.7623 16.6344C79.8127 16.5839 79.8718 16.5596 79.9431 16.5596H82.3449C82.4858 16.5596 82.5849 16.617 82.6423 16.73L88.124 25.7822C88.1518 25.8239 88.1797 25.8431 88.2092 25.8361C88.2371 25.8292 88.251 25.7978 88.251 25.7404L88.2301 16.8152C88.2301 16.7439 88.2545 16.6848 88.3049 16.6344L88.3066 16.6361Z" fill="currentColor"/>
+<path d="M93.8951 31.3633C93.8446 31.3146 93.8203 31.2537 93.8203 31.1824V16.8172C93.8203 16.7459 93.8446 16.6868 93.8951 16.6363C93.9455 16.5859 94.0047 16.5615 94.076 16.5615H96.5629C96.6342 16.5615 96.6934 16.5859 96.7438 16.6363C96.7925 16.6868 96.8186 16.7459 96.8186 16.8172V31.1824C96.8186 31.2537 96.7942 31.3128 96.7438 31.3633C96.6934 31.4137 96.6342 31.4381 96.5629 31.4381H94.076C94.0047 31.4381 93.9455 31.4137 93.8951 31.3633Z" fill="currentColor"/>
+<path d="M109.267 16.5615H111.754C111.825 16.5615 111.885 16.5859 111.935 16.6363C111.984 16.6868 112.01 16.7459 112.01 16.8172V31.1824C112.01 31.2537 111.985 31.3128 111.935 31.3633C111.885 31.4137 111.825 31.4381 111.754 31.4381H109.267C109.196 31.4381 109.137 31.4137 109.086 31.3633C109.036 31.3146 109.011 31.2537 109.011 31.1824V21.812C109.011 21.7563 108.998 21.7268 108.97 21.7268C108.942 21.7268 108.912 21.7476 108.885 21.7911L106.632 25.318C106.561 25.4311 106.462 25.4885 106.335 25.4885H105.081C104.954 25.4885 104.855 25.4328 104.784 25.318L102.531 21.7911C102.504 21.7494 102.474 21.7302 102.446 21.7372C102.418 21.7441 102.405 21.7772 102.405 21.8328V31.1824C102.405 31.2537 102.38 31.3128 102.33 31.3633C102.279 31.4137 102.22 31.4381 102.149 31.4381H99.6619C99.5906 31.4381 99.5315 31.4137 99.481 31.3633C99.4306 31.3146 99.4062 31.2537 99.4062 31.1824V16.8172C99.4062 16.7459 99.4306 16.6868 99.481 16.6363C99.5315 16.5859 99.5906 16.5615 99.6619 16.5615H102.149C102.276 16.5615 102.375 16.6189 102.446 16.732L105.634 21.6833C105.676 21.7685 105.719 21.7685 105.761 21.6833L108.97 16.732C109.041 16.6189 109.14 16.5615 109.267 16.5615Z" fill="currentColor"/>
+<path d="M123.782 31.2241L123.144 29.1424C123.116 29.0867 123.079 29.0572 123.038 29.0572H117.81C117.768 29.0572 117.732 29.085 117.704 29.1424L117.088 31.2241C117.046 31.3668 116.954 31.4363 116.812 31.4363H114.112C114.027 31.4363 113.963 31.412 113.921 31.3615C113.879 31.3128 113.871 31.2381 113.9 31.1389L118.49 16.7737C118.532 16.6328 118.624 16.5615 118.766 16.5615H122.102C122.243 16.5615 122.335 16.6328 122.379 16.7737L126.968 31.1389C126.982 31.1668 126.989 31.2033 126.989 31.245C126.989 31.372 126.911 31.4363 126.756 31.4363H124.057C123.916 31.4363 123.824 31.365 123.78 31.2241H123.782ZM118.554 26.7407H122.295C122.38 26.7407 122.408 26.6989 122.38 26.6137L120.467 20.3024C120.453 20.2467 120.432 20.2207 120.403 20.2276C120.375 20.2346 120.352 20.2589 120.339 20.3024L118.469 26.6137C118.455 26.6989 118.483 26.7407 118.554 26.7407Z" fill="currentColor"/>
+<path d="M128.222 31.353C128.18 31.2974 128.187 31.2261 128.243 31.1409L132.365 24.0643C132.393 24.0226 132.393 23.9791 132.365 23.9374L128.243 16.8609L128.201 16.7339C128.201 16.6209 128.28 16.5635 128.434 16.5635H131.133C131.274 16.5635 131.38 16.6209 131.452 16.7339L134.213 21.6C134.255 21.6852 134.299 21.6852 134.34 21.6L137.102 16.7339C137.173 16.6209 137.28 16.5635 137.42 16.5635H140.099C140.198 16.5635 140.269 16.5913 140.311 16.6487C140.353 16.7061 140.346 16.7756 140.29 16.8609L136.168 23.9374C136.154 23.9791 136.154 24.0226 136.168 24.0643L140.29 31.1409L140.332 31.2678C140.332 31.3809 140.253 31.4383 140.099 31.4383H137.42C137.278 31.4383 137.172 31.3826 137.102 31.2678L134.34 26.4226C134.299 26.3374 134.255 26.3374 134.213 26.4226L131.429 31.2678C131.358 31.3809 131.252 31.4383 131.111 31.4383H128.433C128.333 31.4383 128.262 31.4104 128.22 31.353H128.222Z" fill="currentColor"/>
+<defs>
+<linearGradient id="paint0_linear_17_483" x1="3.99826" y1="24" x2="51.6208" y2="24" gradientUnits="userSpaceOnUse">
+<stop stop-color="#E21680"/>
+<stop offset="1" stop-color="#FF633A"/>
+</linearGradient>
+</defs>
+</svg>
+</div>
+<hr>
+<div align="center" style="line-height: 1;">
+  <a href="https://www.minimax.io" target="_blank" style="margin: 2px;">
+    <img alt="Homepage" src="https://img.shields.io/badge/_Homepage-MiniMax-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgNDkwLjE2IDQxMS43Ij48ZGVmcz48c3R5bGU+LmNscy0xe2ZpbGw6I2ZmZjt9PC9zdHlsZT48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMjMzLjQ1LDQwLjgxYTE3LjU1LDE3LjU1LDAsMSwwLTM1LjEsMFYzMzEuNTZhNDAuODIsNDAuODIsMCwwLDEtODEuNjMsMFYxNDVhMTcuNTUsMTcuNTUsMCwxLDAtMzUuMDksMHY3OS4wNmE0MC44Miw0MC44MiwwLDAsMS04MS42MywwVjE5NS40MmExMS42MywxMS42MywwLDAsMSwyMy4yNiwwdjI4LjY2YTE3LjU1LDE3LjU1LDAsMCwwLDM1LjEsMFYxNDVBNDAuODIsNDAuODIsMCwwLDEsMTQwLDE0NVYzMzEuNTZhMTcuNTUsMTcuNTUsMCwwLDAsMzUuMSwwVjIxNy41aDBWNDAuODFhNDAuODEsNDAuODEsMCwxLDEsODEuNjIsMFYyODEuNTZhMTEuNjMsMTEuNjMsMCwxLDEtMjMuMjYsMFptMjE1LjksNjMuNEE0MC44Niw0MC44NiwwLDAsMCw0MDguNTMsMTQ1VjMwMC44NWExNy41NSwxNy41NSwwLDAsMS0zNS4wOSwwdi0yNjBhNDAuODIsNDAuODIsMCwwLDAtODEuNjMsMFYzNzAuODlhMTcuNTUsMTcuNTUsMCwwLDEtMzUuMSwwVjMzMGExMS42MywxMS42MywwLDEsMC0yMy4yNiwwdjQwLjg2YTQwLjgxLDQwLjgxLDAsMCwwLDgxLjYyLDBWNDAuODFhMTcuNTUsMTcuNTUsMCwwLDEsMzUuMSwwdjI2MGE0MC44Miw0MC44MiwwLDAsMCw4MS42MywwVjE0NWExNy41NSwxNy41NSwwLDEsMSwzNS4xLDBWMjgxLjU2YTExLjYzLDExLjYzLDAsMCwwLDIzLjI2LDBWMTQ1QTQwLjg1LDQwLjg1LDAsMCwwLDQ0OS4zNSwxMDQuMjFaIi8+PC9zdmc+&logoWidth=20" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://agent.minimax.io/" target="_blank" style="margin: 2px;">
+    <img alt="Agent" src="https://img.shields.io/badge/_MiniMax_Agent-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgNDkwLjE2IDQxMS43Ij48ZGVmcz48c3R5bGU+LmNscy0xe2ZpbGw6I2ZmZjt9PC9zdHlsZT48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMjMzLjQ1LDQwLjgxYTE3LjU1LDE3LjU1LDAsMSwwLTM1LjEsMFYzMzEuNTZhNDAuODIsNDAuODIsMCwwLDEtODEuNjMsMFYxNDVhMTcuNTUsMTcuNTUsMCwxLDAtMzUuMDksMHY3OS4wNmE0MC44Miw0MC44MiwwLDAsMS04MS42MywwVjE5NS40MmExMS42MywxMS42MywwLDAsMSwyMy4yNiwwdjI4LjY2YTE3LjU1LDE3LjU1LDAsMCwwLDM1LjEsMFYxNDVBNDAuODIsNDAuODIsMCwwLDEsMTQwLDE0NVYzMzEuNTZhMTcuNTUsMTcuNTUsMCwwLDAsMzUuMSwwVjIxNy41aDBWNDAuODFhNDAuODEsNDAuODEsMCwxLDEsODEuNjIsMFYyODEuNTZhMTEuNjMsMTEuNjMsMCwxLDEtMjMuMjYsMFptMjE1LjksNjMuNEE0MC44Niw0MC44NiwwLDAsMCw0MDguNTMsMTQ1VjMwMC44NWExNy41NSwxNy41NSwwLDAsMS0zNS4wOSwwdi0yNjBhNDAuODIsNDAuODIsMCwwLDAtODEuNjMsMFYzNzAuODlhMTcuNTUsMTcuNTUsMCwwLDEtMzUuMSwwVjMzMGExMS42MywxMS42MywwLDEsMC0yMy4yNiwwdjQwLjg2YTQwLjgxLDQwLjgxLDAsMCwwLDgxLjYyLDBWNDAuODFhMTcuNTUsMTcuNTUsMCwwLDEsMzUuMSwwdjI2MGE0MC44Miw0MC44MiwwLDAsMCw4MS42MywwVjE0NWExNy41NSwxNy41NSwwLDEsMSwzNS4xLDBWMjgxLjU2YTExLjYzLDExLjYzLDAsMCwwLDIzLjI2LDBWMTQ1QTQwLjg1LDQwLjg1LDAsMCwwLDQ0OS4zNSwxMDQuMjFaIi8+PC9zdmc+&logoWidth=20" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://www.minimax.io/platform" style="margin: 2px;">
+    <img alt="API" src="https://img.shields.io/badge/⚡_API-Platform-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://github.com/MiniMax-AI/MiniMax-MCP" style="margin: 2px;">
+    <img alt="MCP" src="https://img.shields.io/badge/🚀_MCP-MiniMax_MCP-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+</div>
+<div align="center" style="line-height: 1;">
+  <a href="https://huggingface.co/MiniMaxAI" target="_blank" style="margin: 2px;">
+    <img alt="Hugging Face" src="https://img.shields.io/badge/🤗_Hugging_Face-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://github.com/MiniMax-AI/MiniMax-M2" target="_blank" style="margin: 2px;">
+    <img alt="GitHub" src="https://img.shields.io/badge/🐙_GitHub-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://www.modelscope.cn/organization/MiniMax" target="_blank" style="margin: 2px;">
+    <img alt="ModelScope" src="https://img.shields.io/badge/🤖️_ModelScope-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE" style="margin: 2px;">
+    <img alt="License" src="https://img.shields.io/badge/⚖️_License-MIT-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://github.com/MiniMax-AI/MiniMax-AI.github.io/blob/main/images/wechat-qrcode.jpeg" target="_blank" style="margin: 2px;">
+    <img alt="WeChat" src="https://img.shields.io/badge/💬_WeChat-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+</div>
+# Meet MiniMax-M2
+Today, we release and open source MiniMax-M2, a **Mini** model built for **Max** coding & agentic workflows.
+**MiniMax-M2** redefines efficiency for agents. It's a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever.
+<p align="center">
+  <img width="100%" src="figures/Bench.png">
+</p>
+---
+## Highlights
+**Superior Intelligence**. According to benchmarks from Artificial Analysis, MiniMax-M2 demonstrates highly competitive general intelligence across mathematics, science, instruction following, coding, and agentic tool use. **Its composite score ranks #1 among open-source models globally**.
+**Advanced Coding**. Engineered for end-to-end developer workflows, MiniMax-M2 excels at multi-file edits, coding-run-fix loops, and test-validated repairs. Strong performance on Terminal-Bench and (Multi-)SWE-Bench–style tasks demonstrates practical effectiveness in terminals, IDEs, and CI across languages.
+**Agent Performance**. MiniMax-M2 plans and executes complex, long-horizon toolchains across shell, browser, retrieval, and code runners. In BrowseComp-style evaluations, it consistently locates hard-to-surface sources, maintains evidence traceable, and gracefully recovers from flaky steps.
+**Efficient Design**. With 10 billion activated parameters (230 billion in total), MiniMax-M2 delivers lower latency, lower cost, and higher throughput for interactive agents and batched sampling—perfectly aligned with the shift toward highly deployable models that still shine on coding and agentic tasks.
+---
+## Coding & Agentic Benchmarks
+These comprehensive evaluations test real-world end-to-end coding and agentic tool use: editing real repos, executing commands, browsing the web, and delivering functional solutions. Performance on this suite correlates with day-to-day developer experience in terminals, IDEs, and CI.
+| **Benchmark** | **MiniMax-M2** | **Claude Sonnet 4** | **Claude Sonnet 4.5** | **Gemini 2.5 Pro** | **GPT-5 (thinking)** | **GLM-4.6** | **Kimi K2 0905** | **DeepSeek-V3.2** |
+|-----------|------------|-----------------|-------------------|-----------------|------------------|---------|---------------|----------------|
+| **SWE-bench Verified** | 69.4 | 72.7 * | 77.2 * | 63.8 * | 74.9 * | 68 * | 69.2 * | 67.8 * |
+| **Multi-SWE-Bench** | 36.2 | 35.7 * | 44.3 | / | / | 30 | 33.5 | 30.6 |
+| **SWE-bench Multilingual** | 56.5 | 56.9 * | 68 | / | / | 53.8 | 55.9 * | 57.9 * |
+| **Terminal-Bench** | 46.3 | 36.4 * | 50 * | 25.3 * | 43.8 * | 40.5 * | 44.5 * | 37.7 * |
+| **ArtifactsBench** | 66.8 | 57.3* | 61.5 | 57.7* | 73* | 59.8 | 54.2 | 55.8 |
+| **BrowseComp** | 44 | 12.2 | 19.6 | 9.9 | 54.9* | 45.1* | 14.1 | 40.1* |
+| **BrowseComp-zh** | 48.5 | 29.1 | 40.8 | 32.2 | 65 | 49.5 | 28.8 | 47.9* |
+| **GAIA (text only)** | 75.7 | 68.3 | 71.2 | 60.2 | 76.4 | 71.9 | 60.2 | 63.5 |
+| **xbench-DeepSearch** | 72 | 64.6 | 66 | 56 | 77.8 | 70 | 61 | 71 |
+| **HLE (w/ tools)** | 31.8 | 20.3 | 24.5 | 28.4 * | 35.2 * | 30.4 * | 26.9 * | 27.2 * |
+| **τ²-Bench** | 77.2 | 65.5* | 84.7* | 59.2 | 80.1* | 75.9* | 70.3 | 66.7 |
+| **FinSearchComp-global** | 65.5 | 42 | 60.8 | 42.6* | 63.9* | 29.2 | 29.5* | 26.2 |
+| **AgentCompany** | 36 | 37 | 41 | 39.3* | / | 35 | 30 | 34 |
+>Notes: Data points marked with an asterisk (*) are taken directly from the model's official tech report or blog. All other metrics were obtained using the evaluation methods described below.
+>- SWE-bench Verified:  We use the same scaffold as [R2E-Gym](https://arxiv.org/pdf/2504.07164) (Jain et al. 2025) on top of OpenHands to test with agents on SWE tasks. All scores are validated on our internal infrastructure with 128k context length, 100 max steps, and no test-time scaling. All git-related content is removed to ensure agent sees only the code at the issue point.
+>- Multi-SWE-Bench & SWE-bench Multilingual: All scores are averaged across 8 runs using the [claude-code](https://github.com/anthropics/claude-code) CLI (300 max steps) as the evaluation scaffold.
+>- Terminal-Bench: All scores are evaluated with the official claude-code from the original [Terminal-Bench](https://www.tbench.ai/) repository(commit `94bf692`), averaged over 8 runs to report the mean pass rate.
+>- ArtifactsBench: All Scores are computed by averaging three runs with the official implementation of [ArtifactsBench](https://github.com/Tencent-Hunyuan/ArtifactsBenchmark), using the stable Gemini-2.5-Pro as the judge model.
+>- BrowseComp & BrowseComp-zh & GAIA (text only) & xbench-DeepSearch: All scores reported use the same agent framework as [WebExplorer](https://arxiv.org/pdf/2509.06501) (Liu et al. 2025), with minor tools description adjustment. We use the 103-sample text-only GAIA validation subset following [WebExplorer](https://arxiv.org/pdf/2509.06501) (Liu et al. 2025).
+>- HLE (w/ tools): All reported scores are obtained using search tools and a Python tool. The search tools employ the same agent framework as [WebExplorer](https://arxiv.org/pdf/2509.06501) (Liu et al. 2025), and the Python tool runs in a Jupyter environment. We use the text-only HLE subset.
+>- τ²-Bench: All scores reported use "extended thinking with tool use", and employ GPT-4.1 as the user simulator.
+>- FinSearchComp-global: Official results are reported for GPT-5-Thinking, Gemini 2.5 Pro, and Kimi-K2. Other models are evaluated using the open-source [FinSearchComp](https://arxiv.org/pdf/2509.13160) (Hu et al. 2025) framework using both  search and Python tools, launched simultaneously for consistency.
+>- AgentCompany: All scores reported use OpenHands 0.42 agent framework.
+---
+## Intelligence Benchmarks
+We align with **Artificial Analysis**, which aggregates challenging benchmarks using a consistent methodology to reflect a model’s broader **intelligence profile** across math, science, instruction following, coding, and agentic tool use.
+| **Metric (AA)** | **MiniMax-M2** | **Claude Sonnet 4** | **Claude Sonnet 4.5** | **Gemini 2.5 Pro** | **GPT-5 (thinking)** | **GLM-4.6** | **Kimi K2 0905** | **DeepSeek-V3.2** |
+|-----------------|----------------|---------------------|------------------------|---------------------|----------------------|-------------|------------------|-------------------|
+| AIME25 | 78 | 74 | 88 | 88 | 94 | 86 | 57 | 88 |
+| MMLU-Pro | 82 | 84 | 88 | 86 | 87 | 83 | 82 | 85 |
+| GPQA-Diamond | 78 | 78 | 83 | 84 | 85 | 78 | 77 | 80 |
+| HLE (w/o tools) | 12.5 | 9.6 | 17.3 | 21.1 | 26.5 | 13.3 | 6.3 | 13.8 |
+| LiveCodeBench (LCB) | 83 | 66 | 71 | 80 | 85 | 70 | 61 | 79 |
+| SciCode | 36 | 40 | 45 | 43 | 43 | 38 | 31 | 38 |
+| IFBench | 72 | 55 | 57 | 49 | 73 | 43 | 42 | 54 |
+| AA-LCR | 61 | 65 | 66 | 66 | 76 | 54 | 52 | 69 |
+| τ²-Bench-Telecom | 87 | 65 | 78 | 54 | 85 | 71 | 73 | 34 |
+| Terminal-Bench-Hard | 24 | 30 | 33 | 25 | 31 | 23 | 23 | 29 |
+| **AA Intelligence** | 61 | 57 | 63 | 60 | 69 | 56 | 50 | 57 |
+>AA: All scores of MiniMax-M2 aligned with Artificial Analysis Intelligence Benchmarking Methodology (https://artificialanalysis.ai/methodology/intelligence-benchmarking). All scores of other models reported from https://artificialanalysis.ai/.
+---
+## Why activation size matters
+By maintaining activations around **10B** , the plan → act → verify loop in the agentic workflow is streamlined, improving responsiveness and reducing compute overhead:
+- **Faster feedback cycles** in compile-run-test and browse-retrieve-cite chains.
+- **More concurrent runs** on the same budget for regression suites and multi-seed explorations.
+- **Simpler capacity planning** with smaller per-request memory and steadier tail latency.
+In short: **10B activations = responsive agent loops + better unit economics**.
+## At a glance
+If you need frontier-style coding and agents without frontier-scale costs, **MiniMax-M2** hits the sweet spot: fast inference speeds, robust tool-use capabilities, and a deployment-friendly footprint.
+We look forward to your feedback and to collaborating with developers and researchers to bring the future of intelligent collaboration one step closer.
+## How to Use
+- Our product **MiniMax Agent**, built on MiniMax-M2, is now **publicly available and free** for a limited time: https://agent.minimaxi.io/
+- The MiniMax-M2 API is now live on the **MiniMax Open Platform** and is **free** for a limited time: https://platform.minimax.io/docs/guides/text-generation
+- The MiniMax-M2 model weights are now **open-source**, allowing for local deployment and use: https://huggingface.co/MiniMaxAI/MiniMax-M2.
+## Local Deployment Guide
+Download the model from HuggingFace repository: https://huggingface.co/MiniMaxAI/MiniMax-M2. We recommend using the following inference frameworks (listed alphabetically) to serve the model:
+### SGLang
+We recommend using [SGLang](https://docs.sglang.ai/) to serve MiniMax-M2. SGLang provides solid day-0 support for MiniMax-M2 model. Please refer to our [SGLang Deployment Guide](https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/docs/sglang_deploy_guide.md) for more details, and thanks so much for our collaboration with the SGLang team.
+### vLLM
+We recommend using [vLLM](https://docs.vllm.ai/en/stable/) to serve MiniMax-M2. vLLM provides efficient day-0 support of MiniMax-M2 model, check https://docs.vllm.ai/projects/recipes/en/latest/MiniMaxAI/MiniMax-M2.html for latest deployment guide. We also provide our [vLLM Deployment Guide](https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/docs/vllm_deploy_guide.md).
+### Inference Parameters
+We recommend using the following parameters for best performance: `temperature=1.0`, `top_p = 0.95`, `top_k = 40`.
+**IMPORTANT:** MiniMax-M2 is an interleaved thinking model. Therefore, when using it, it is important to retain the thinking content from the assistant's turns within the historical messages. In the model's output content, we use the `<think>...</think>` format to wrap the assistant's thinking content. When using the model, you must ensure that the historical content is passed back in its original format. Do not remove the `<think>...</think>` part, otherwise, the model's performance will be negatively affected.
+## Tool Calling Guide
+Please refer to our [Tool Calling Guide](https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/docs/tool_calling_guide.md).
+# Contact Us
+Contact us at [model@minimax.io](mailto:model@minimax.io).

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,159 @@

+{# ----------‑‑‑ special token variables ‑‑‑---------- #}
+{%- set toolcall_begin_token   = '<minimax:tool_call>'         -%}
+{%- set toolcall_end_token     = '</minimax:tool_call>'        -%}
+{#- Tool Rendering Functions ============================================== -#}
+{%- macro render_tool_namespace(namespace_name, tool_list) -%}
+{%- for tool in tool_list -%}
+<tool>{{ tool.function | tojson(ensure_ascii=False) }}</tool>
+{% endfor -%}
+{%- endmacro -%}
+{%- macro visible_text(content) -%}
+    {%- if content is string -%}
+        {{ content }}
+    {%- elif content is iterable and content is not mapping -%}
+        {%- for item in content -%}
+            {%- if item is mapping and item.type == 'text' -%}
+                {{- item.text }}
+            {%- elif item is string -%}
+                {{- item }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{- content }}
+    {%- endif -%}
+{%- endmacro -%}
+{#- System Message Construction ============================================ -#}
+{%- macro build_system_message(system_message) -%}
+    {%- if system_message and system_message.content -%}
+        {{- visible_text(system_message.content) }}
+    {%- else -%}
+        {%- if model_identity is not defined -%}
+            {%- set model_identity = "You are a helpful assistant." -%}
+        {%- endif -%}
+        {{- model_identity }}
+    {%- endif -%}
+    {#- Handle current_date -#}
+    {%- if system_message and system_message.current_date -%}
+        {{- '\n' ~ 'Current date: ' + system_message.current_date }}
+    {%- endif -%}
+    {#- Handle current_location -#}
+    {%- if system_message and system_message.current_location -%}
+        {{- '\n' ~ 'Current location: ' + system_message.current_location }}
+    {%- endif -%}
+{%- endmacro -%}
+{#- Main Template Logic ================================================= -#}
+{#- Extract system message (only first message if it's system) -#}
+{%- set system_message = none -%}
+{%- set conversation_messages = messages -%}
+{%- if messages and messages[0].role == "system" -%}
+    {%- set system_message = messages[0] -%}
+    {%- set conversation_messages = messages[1:] -%}
+{%- endif -%}
+{#- Get the last user message turn, for interleved thinking -#}
+{%- set ns = namespace(last_user_index=-1) %}
+{% for m in conversation_messages %}
+    {%- if m.role == 'user' %}
+        {% set ns.last_user_index = loop.index0 -%}
+    {%- endif %}
+{%- endfor %}
+{#- Render system message -#}
+{{- ']~!b[' ~ ']~b]system' ~ '\n' }}
+{{- build_system_message(system_message) }}
+{#- Render tools if available -#}
+{%- if tools -%}
+    {{- '\n\n' ~ '# Tools' ~ '\n' ~ 'You may call one or more tools to assist with the user query.\nHere are the tools available in JSONSchema format:' ~ '\n' }}
+    {{- '\n' ~ '<tools>' ~ '\n' }}
+    {{- render_tool_namespace("functions", tools) }}
+    {{- '</tools>' ~ '\n\n' }}
+{{- 'When making tool calls, use XML format to invoke tools and pass parameters:' ~ '\n' }}
+{{- '\n' ~ toolcall_begin_token }}
+<invoke name="tool-name-1">
+<parameter name="param-key-1">param-value-1</parameter>
+<parameter name="param-key-2">param-value-2</parameter>
+...
+</invoke>
+{{- '\n' ~ toolcall_end_token }}
+{%- endif -%}
+{{- '[e~[\n' }}
+{#- Render messages -#}
+{%- set last_tool_call = namespace(name=none) -%}
+{%- for message in conversation_messages -%}
+    {%- if message.role == 'assistant' -%}
+        {#- Only render reasoning_content if no user message follows -#}
+        {{- ']~b]ai' ~ '\n' }}
+        {%- set reasoning_content = '' %}
+        {%- set content = visible_text(message.content) %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].strip('\n').split('<think>')[-1].strip('\n') %}
+                {%- set content = content.split('</think>')[-1].strip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if reasoning_content and loop.index0 > ns.last_user_index -%}
+            {{- '<think>' ~ '\n' ~ reasoning_content ~ '\n' ~ '</think>' ~ '\n\n' }}
+        {%- endif -%}
+        {%- if content -%}
+            {{- content }}
+        {%- endif -%}
+        {%- if message.tool_calls -%}
+            {{- '\n' ~ toolcall_begin_token ~ '\n' }}
+            {%- for tool_call in message.tool_calls -%}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<invoke name="' + tool_call.name + '">' }}
+                {% set _args = tool_call.arguments %}
+                {%- for k, v in _args.items() %}
+                {{- '<parameter name="' + k + '">' }}
+                {{- v | tojson(ensure_ascii=False) if v is not string else v }}
+                {{- '</parameter>' }}
+                {% endfor %}
+                {{- '</invoke>' ~ '\n' }}
+            {%- endfor -%}
+            {{- toolcall_end_token}}
+            {%- set last_tool_call.name = message.tool_calls[-1].name -%}
+        {%- else -%}
+            {%- set last_tool_call.name = none -%}
+        {%- endif -%}
+        {{- '[e~[' ~ '\n' }}
+    {%- elif message.role == 'tool' -%}
+    {%- if last_tool_call.name is none -%}
+        {{- raise_exception("Message has tool role, but there was no previous assistant message with a tool call!") }}
+    {%- endif -%}
+    {%- if loop.first or (conversation_messages[loop.index0 - 1].role != 'tool') -%}
+        {{- ']~b]tool' }}
+    {%- endif -%}
+    {%- if message.content is string -%}
+        {{- '\n<response>' }}
+        {{- message.content }}
+        {{- '</response>' }}
+    {%- else -%}
+        {%- for tr in message.content -%}
+            {{- '\n<response>' }}
+            {{- tr.output if tr.output is defined else (tr.text if tr.type == 'text' and tr.text is defined else tr) }}
+            {{- '\n</response>' }}
+        {%- endfor -%}
+    {%- endif -%}
+    {%- if loop.last or (conversation_messages[loop.index0 + 1].role != 'tool') -%}
+        {{- '[e~[\n' -}}
+    {%- endif -%}
+    {%- elif message.role == 'user' -%}
+        {{- ']~b]user' ~ '\n' }}
+        {{- visible_text(message.content) }}
+        {{- '[e~[' ~ '\n' }}
+    {%- endif -%}
+{%- endfor -%}
+{#- Generation prompt -#}
+{%- if add_generation_prompt -%}
+{{- ']~b]ai' ~ '\n' ~ '<think>' ~ '\n' }}
+{%- endif -%}

config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:08f4c0913799dbe6ef8999dd694633c5b6a11b66a3714dc3aa9f8ae1a3d8f5eb
+size 1832

configuration.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f888421726665e8a84b738eed42a64875aed79de8be7daade851ac8bf4c0cef9
+size 73

docs/function_call_guide.md ADDED Viewed

	@@ -0,0 +1,482 @@

+# MiniMax-M2 Function Call Guide
+## Introduction
+The MiniMax-M2 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. This document provides detailed instructions on how to use the function calling features of MiniMax-M2.
+## Basic Example
+The following Python script implements a weather query function call example based on the OpenAI SDK:
+```python
+from openai import OpenAI
+import json
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
+def get_weather(location: str, unit: str):
+    return f"Getting the weather for {location} in {unit}..."
+tool_functions = {"get_weather": get_weather}
+tools = [{
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "description": "Get the current weather in a given location",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
+                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
+            },
+            "required": ["location", "unit"]
+        }
+    }
+}]
+response = client.chat.completions.create(
+    model=client.models.list().data[0].id,
+    messages=[{"role": "user", "content": "What's the weather like in San Francisco? use celsius."}],
+    tools=tools,
+    tool_choice="auto"
+)
+print(response)
+tool_call = response.choices[0].message.tool_calls[0].function
+print(f"Function called: {tool_call.name}")
+print(f"Arguments: {tool_call.arguments}")
+print(f"Result: {get_weather(**json.loads(tool_call.arguments))}")
+```
+**Output Example:**
+```
+Function called: get_weather
+Arguments: {"location": "San Francisco, CA", "unit": "celsius"}
+Result: Getting the weather for San Francisco, CA in celsius...
+```
+## Manually Parsing Model Output
+If you cannot use the built-in parser of inference engines that support MiniMax-M2, or need to use other inference frameworks (such as transformers, TGI, etc.), you can manually parse the model's raw output using the following method. This approach requires you to parse the XML tag format of the model output yourself.
+### Example Using Transformers
+Here is a complete example using the transformers library:
+```python
+from transformers import AutoTokenizer
+def get_default_tools():
+    return [
+        {
+          "name": "get_current_weather",
+          "description": "Get the latest weather for a location",
+          "parameters": {
+              "type": "object",
+              "properties": {
+                  "location": {
+                      "type": "string",
+                      "description": "A certain city, such as Beijing, Shanghai"
+                  }
+              },
+          }
+          "required": ["location"],
+          "type": "object"
+        }
+    ]
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+prompt = "What's the weather like in Shanghai today?"
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": prompt},
+]
+# Enable function calling tools
+tools = get_default_tools()
+# Apply chat template and include tool definitions
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    tools=tools
+)
+# Send request (using any inference service)
+import requests
+payload = {
+    "model": "MiniMaxAI/MiniMax-M2",
+    "prompt": text,
+    "max_tokens": 4096
+}
+response = requests.post(
+    "http://localhost:8000/v1/completions",
+    headers={"Content-Type": "application/json"},
+    json=payload,
+    stream=False,
+)
+# Model output needs manual parsing
+raw_output = response.json()["choices"][0]["text"]
+print("Raw output:", raw_output)
+# Use the parsing function below to process the output
+function_calls = parse_tool_calls(raw_output, tools)
+```
+## 🛠️ Function Call Definition
+### Function Structure
+Function calls need to define the `tools` field in the request body. Each function consists of the following parts:
+```json
+{
+  "tools": [
+    {
+      "name": "search_web",
+      "description": "Search function.",
+      "parameters": {
+        "properties": {
+          "query_list": {
+            "description": "Keywords for search, list should contain 1 element.",
+            "items": { "type": "string" },
+            "type": "array"
+          },
+          "query_tag": {
+            "description": "Category of query",
+            "items": { "type": "string" },
+            "type": "array"
+          }
+        },
+        "required": [ "query_list", "query_tag" ],
+        "type": "object"
+      }
+    }
+  ]
+}
+```
+**Field Descriptions:**
+- `name`: Function name
+- `description`: Function description
+- `parameters`: Function parameter definition
+  - `properties`: Parameter property definition, where key is the parameter name and value contains detailed parameter description
+  - `required`: List of required parameters
+  - `type`: Parameter type (usually "object")
+### Internal Processing Format
+When processing within the MiniMax-M2 model, function definitions are converted to a special format and concatenated to the input text. Here is a complete example:
+```
+]~!b[]~b]system
+You are a helpful assistant.
+# Tools
+You may call one or more tools to assist with the user query.
+Here are the tools available in JSONSchema format:
+<tools>
+<tool>{"name": "search_web", "description": "Search function.", "parameters": {"type": "object", "properties": {"query_list": {"type": "array", "items": {"type": "string"}, "description": "Keywords for search, list should contain 1 element."}, "query_tag": {"type": "array", "items": {"type": "string"}, "description": "Category of query"}}, "required": ["query_list", "query_tag"]}}</tool>
+</tools>
+When making tool calls, use XML format to invoke tools and pass parameters:
+<minimax:tool_call>
+<invoke name="tool-name-1">
+<parameter name="param-key-1">param-value-1</parameter>
+<parameter name="param-key-2">param-value-2</parameter>
+...
+</invoke>
+[e~[
+]~b]user
+When were the latest announcements from OpenAI and Gemini?[e~[
+]~b]ai
+<think>
+```
+**Format Description:**
+- `]~!b[]~b]system`: System message start marker
+- `[e~[`: Message end marker
+- `]~b]user`: User message start marker
+- `]~b]ai`: Assistant message start marker
+- `]~b]tool`: Tool result message start marker
+- `<tools>...</tools>`: Tool definition area, each tool is wrapped with `<tool>` tag, content is JSON Schema
+- `<minimax:tool_call>...</minimax:tool_call>`: Tool call area
+- `<think>`: Thinking process marker during generation (optional)
+### Model Output Format
+MiniMax-M2 uses structured XML tag format:
+```xml
+<minimax:tool_call>
+<invoke name="search_web">
+<parameter name="query_tag">["technology", "events"]</parameter>
+<parameter name="query_list">["\"OpenAI\" \"latest\" \"release\""]</parameter>
+</invoke>
+<invoke name="search_web">
+<parameter name="query_tag">["technology", "events"]</parameter>
+<parameter name="query_list">["\"Gemini\" \"latest\" \"release\""]</parameter>
+</invoke>
+</minimax:tool_call>
+```
+Each function call uses the `<invoke name="function_name">` tag, and parameters use the `<parameter name="parameter_name">` tag wrapper.
+## Manually Parsing Function Call Results
+### Parsing Function Calls
+MiniMax-M2 uses structured XML tags, which require a different parsing approach. The core function is as follows:
+```python
+import re
+import json
+from typing import Any, Optional, List, Dict
+def extract_name(name_str: str) -> str:
+    """Extract name from quoted string"""
+    name_str = name_str.strip()
+    if name_str.startswith('"') and name_str.endswith('"'):
+        return name_str[1:-1]
+    elif name_str.startswith("'") and name_str.endswith("'"):
+        return name_str[1:-1]
+    return name_str
+def convert_param_value(value: str, param_type: str) -> Any:
+    """Convert parameter value based on parameter type"""
+    if value.lower() == "null":
+        return None
+    param_type = param_type.lower()
+    if param_type in ["string", "str", "text"]:
+        return value
+    elif param_type in ["integer", "int"]:
+        try:
+            return int(value)
+        except (ValueError, TypeError):
+            return value
+    elif param_type in ["number", "float"]:
+        try:
+            val = float(value)
+            return val if val != int(val) else int(val)
+        except (ValueError, TypeError):
+            return value
+    elif param_type in ["boolean", "bool"]:
+        return value.lower() in ["true", "1"]
+    elif param_type in ["object", "array"]:
+        try:
+            return json.loads(value)
+        except json.JSONDecodeError:
+            return value
+    else:
+        # Try JSON parsing, return string if failed
+        try:
+            return json.loads(value)
+        except json.JSONDecodeError:
+            return value
+def parse_tool_calls(model_output: str, tools: Optional[List[Dict]] = None) -> List[Dict]:
+    """
+    Extract all tool calls from model output
+    Args:
+        model_output: Complete output text from the model
+        tools: Tool definition list for getting parameter type information, format can be:
+               - [{"name": "...", "parameters": {...}}]
+               - [{"type": "function", "function": {"name": "...", "parameters": {...}}}]
+    Returns:
+        Parsed tool call list, each element contains name and arguments fields
+    Example:
+        >>> tools = [{
+        ...     "name": "get_weather",
+        ...     "parameters": {
+        ...         "type": "object",
+        ...         "properties": {
+        ...             "location": {"type": "string"},
+        ...             "unit": {"type": "string"}
+        ...         }
+        ...     }
+        ... }]
+        >>> output = '''<minimax:tool_call>
+        ... <invoke name="get_weather">
+        ... <parameter name="location">San Francisco</parameter>
+        ... <parameter name="unit">celsius</parameter>
+        ... </invoke>
+        ... </minimax:tool_call>'''
+        >>> result = parse_tool_calls(output, tools)
+        >>> print(result)
+        [{'name': 'get_weather', 'arguments': {'location': 'San Francisco', 'unit': 'celsius'}}]
+    """
+    # Quick check if tool call marker is present
+    if "<minimax:tool_call>" not in model_output:
+        return []
+    tool_calls = []
+    try:
+        # Match all <minimax:tool_call> blocks
+        tool_call_regex = re.compile(r"<minimax:tool_call>(.*?)</minimax:tool_call>", re.DOTALL)
+        invoke_regex = re.compile(r"<invoke name=(.*?)</invoke>", re.DOTALL)
+        parameter_regex = re.compile(r"<parameter name=(.*?)</parameter>", re.DOTALL)
+        # Iterate through all tool_call blocks
+        for tool_call_match in tool_call_regex.findall(model_output):
+            # Iterate through all invokes in this block
+            for invoke_match in invoke_regex.findall(tool_call_match):
+                # Extract function name
+                name_match = re.search(r'^([^>]+)', invoke_match)
+                if not name_match:
+                    continue
+                function_name = extract_name(name_match.group(1))
+                # Get parameter configuration
+                param_config = {}
+                if tools:
+                    for tool in tools:
+                        tool_name = tool.get("name") or tool.get("function", {}).get("name")
+                        if tool_name == function_name:
+                            params = tool.get("parameters") or tool.get("function", {}).get("parameters")
+                            if isinstance(params, dict) and "properties" in params:
+                                param_config = params["properties"]
+                            break
+                # Extract parameters
+                param_dict = {}
+                for match in parameter_regex.findall(invoke_match):
+                    param_match = re.search(r'^([^>]+)>(.*)', match, re.DOTALL)
+                    if param_match:
+                        param_name = extract_name(param_match.group(1))
+                        param_value = param_match.group(2).strip()
+                        # Remove leading and trailing newlines
+                        if param_value.startswith('\n'):
+                            param_value = param_value[1:]
+                        if param_value.endswith('\n'):
+                            param_value = param_value[:-1]
+                        # Get parameter type and convert
+                        param_type = "string"
+                        if param_name in param_config:
+                            if isinstance(param_config[param_name], dict) and "type" in param_config[param_name]:
+                                param_type = param_config[param_name]["type"]
+                        param_dict[param_name] = convert_param_value(param_value, param_type)
+                tool_calls.append({
+                    "name": function_name,
+                    "arguments": param_dict
+                })
+    except Exception as e:
+        print(f"Failed to parse tool calls: {e}")
+        return []
+    return tool_calls
+```
+**Usage Example:**
+```python
+# Define tools
+tools = [
+    {
+        "name": "get_weather",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "location": {"type": "string"},
+                "unit": {"type": "string"}
+            },
+            "required": ["location", "unit"]
+        }
+    }
+]
+# Model output
+model_output = """Let me help you query the weather.
+<minimax:tool_call>
+<invoke name="get_weather">
+<parameter name="location">San Francisco</parameter>
+<parameter name="unit">celsius</parameter>
+</invoke>
+</minimax:tool_call>"""
+# Parse tool calls
+tool_calls = parse_tool_calls(model_output, tools)
+# Output results
+for call in tool_calls:
+    print(f"Function called: {call['name']}")
+    print(f"Arguments: {call['arguments']}")
+    # Output: Function called: get_weather
+    #         Arguments: {'location': 'San Francisco', 'unit': 'celsius'}
+```
+### Executing Function Calls
+After parsing is complete, you can execute the corresponding function and construct the return result:
+```python
+def execute_function_call(function_name: str, arguments: dict):
+    """Execute function call and return result"""
+    if function_name == "get_weather":
+        location = arguments.get("location", "Unknown location")
+        unit = arguments.get("unit", "celsius")
+        # Build function execution result
+        return {
+            "role": "tool",
+            "content": [
+              {
+                "name": function_name,
+                "type": "text",
+                "text": json.dumps({
+                    "location": location,
+                    "temperature": "25",
+                    "unit": unit,
+                    "weather": "Sunny"
+                }, ensure_ascii=False)
+              }
+            ]
+          }
+    elif function_name == "search_web":
+        query_list = arguments.get("query_list", [])
+        query_tag = arguments.get("query_tag", [])
+        # Simulate search results
+        return {
+            "role": "tool",
+            "content": [
+              {
+                "name": function_name,
+                "type": "text",
+                "text": f"Search keywords: {query_list}, Category: {query_tag}\nSearch results: Relevant information found"
+              }
+            ]
+          }
+    return None
+```
+### Returning Function Execution Results to the Model
+After successfully parsing function calls, you should add the function execution results to the conversation history so that the model can access and utilize this information in subsequent interactions. Refer to chat_template.jinja for concatenation format.
+## References
+- [MiniMax-M2 Model Repository](https://github.com/MiniMax-AI/MiniMax-M2)
+- [vLLM Project Homepage](https://github.com/vllm-project/vllm)
+- [OpenAI Python SDK](https://github.com/openai/openai-python)

docs/function_call_guide_cn.md ADDED Viewed

	@@ -0,0 +1,482 @@

+# MiniMax-M2 函数调用（Function Call）功能指南
+## 简介
+MiniMax-M2 模型支持函数调用功能，使模型能够识别何时需要调用外部函数，并以结构化格式输出函数调用参数。本文档详细介绍了如何使用 MiniMax-M2 的函数调用功能。
+## 基础示例
+以下 Python 脚本基于 OpenAI SDK 实现了一个天气查询函数的调用示例：
+```python
+from openai import OpenAI
+import json
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
+def get_weather(location: str, unit: str):
+    return f"Getting the weather for {location} in {unit}..."
+tool_functions = {"get_weather": get_weather}
+tools = [{
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "description": "Get the current weather in a given location",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
+                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
+            },
+            "required": ["location", "unit"]
+        }
+    }
+}]
+response = client.chat.completions.create(
+    model=client.models.list().data[0].id,
+    messages=[{"role": "user", "content": "What's the weather like in San Francisco? use celsius."}],
+    tools=tools,
+    tool_choice="auto"
+)
+print(response)
+tool_call = response.choices[0].message.tool_calls[0].function
+print(f"Function called: {tool_call.name}")
+print(f"Arguments: {tool_call.arguments}")
+print(f"Result: {get_weather(**json.loads(tool_call.arguments))}")
+```
+**输出示例：**
+```
+Function called: get_weather
+Arguments: {"location": "San Francisco, CA", "unit": "celsius"}
+Result: Getting the weather for San Francisco, CA in celsius...
+```
+## 手动解析模型输出
+如果您无法使用已支持 MiniMax-M2 的推理引擎的内置解析器，或者需要使用其他推理框架（如 transformers、TGI 等），可以使用以下方法手动解析模型的原始输出。这种方法需要您自己解析模型输出的 XML 标签格式。
+### 使用 Transformers 的示例
+以下是使用 transformers 库的完整示例：
+```python
+from transformers import AutoTokenizer
+def get_default_tools():
+    return [
+        {
+          "name": "get_current_weather",
+          "description": "Get the latest weather for a location",
+          "parameters": {
+              "type": "object",
+              "properties": {
+                  "location": {
+                      "type": "string",
+                      "description": "A certain city, such as Beijing, Shanghai"
+                  }
+              },
+          }
+          "required": ["location"],
+          "type": "object"
+        }
+    ]
+# 加载模型和分词器
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+prompt = "What's the weather like in Shanghai today?"
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": prompt},
+]
+# 启用函数调用工具
+tools = get_default_tools()
+# 应用聊天模板，并加入工具定义
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    tools=tools
+)
+# 发送请求（这里使用任何推理服务）
+import requests
+payload = {
+    "model": "MiniMaxAI/MiniMax-M2",
+    "prompt": text,
+    "max_tokens": 4096
+}
+response = requests.post(
+    "http://localhost:8000/v1/completions",
+    headers={"Content-Type": "application/json"},
+    json=payload,
+    stream=False,
+)
+# 模型输出需要手动解析
+raw_output = response.json()["choices"][0]["text"]
+print("原始输出:", raw_output)
+# 使用下面的解析函数处理输出
+function_calls = parse_tool_calls(raw_output, tools)
+```
+## 🛠️ 函数调用的定义
+### 函数结构体
+函数调用需要在请求体中定义 `tools` 字段，每个函数由以下部分组成：
+```json
+{
+  "tools": [
+    {
+      "name": "search_web",
+      "description": "搜索函数。",
+      "parameters": {
+        "properties": {
+          "query_list": {
+            "description": "进行搜索的关键词，列表元素个数为1。",
+            "items": { "type": "string" },
+            "type": "array"
+          },
+          "query_tag": {
+            "description": "query的分类",
+            "items": { "type": "string" },
+            "type": "array"
+          }
+        },
+        "required": [ "query_list", "query_tag" ],
+        "type": "object"
+      }
+    }
+  ]
+}
+```
+**字段说明：**
+- `name`: 函数名称
+- `description`: 函数功能描述
+- `parameters`: 函数参数定义
+  - `properties`: 参数属性定义，key 是参数名，value 包含参数的详细描述
+  - `required`: 必填参数列表
+  - `type`: 参数类型（通常为 "object"）
+### 模型内部处理格式
+在 MiniMax-M2 模型内部处理���，函数定义会被转换为特殊格式并拼接到输入文本中。以下是一个完整的示例：
+```
+]~!b[]~b]system
+You are a helpful assistant.
+# Tools
+You may call one or more tools to assist with the user query.
+Here are the tools available in JSONSchema format:
+<tools>
+<tool>{"name": "search_web", "description": "搜索函数。", "parameters": {"type": "object", "properties": {"query_list": {"type": "array", "items": {"type": "string"}, "description": "进行搜索的关键词，列表元素个数为1。"}, "query_tag": {"type": "array", "items": {"type": "string"}, "description": "query的分类"}}, "required": ["query_list", "query_tag"]}}</tool>
+</tools>
+When making tool calls, use XML format to invoke tools and pass parameters:
+<minimax:tool_call>
+<invoke name="tool-name-1">
+<parameter name="param-key-1">param-value-1</parameter>
+<parameter name="param-key-2">param-value-2</parameter>
+...
+</invoke>
+[e~[
+]~b]user
+OpenAI 和 Gemini 的最近一次发布会都是什么时候?[e~[
+]~b]ai
+<think>
+```
+**格式说明：**
+- `]~!b[]~b]system`: System 消息开始标记
+- `[e~[`: 消息结束标记
+- `]~b]user`: User 消息开始标记
+- `]~b]ai`: Assistant 消息开始标记
+- `]~b]tool`: Tool 结果消息开始标记
+- `<tools>...</tools>`: 工具定义区域，每个工具用 `<tool>` 标签包裹，内容为 JSON Schema
+- `<minimax:tool_call>...</minimax:tool_call>`: 工具调用区域
+- `<think>`: 生成时的思考过程标记（可选）
+### 模型输出格式
+MiniMax-M2使用结构化的 XML 标签格式：
+```xml
+<minimax:tool_call>
+<invoke name="search_web">
+<parameter name="query_tag">["technology", "events"]</parameter>
+<parameter name="query_list">["\"OpenAI\" \"latest\" \"release\""]</parameter>
+</invoke>
+<invoke name="search_web">
+<parameter name="query_tag">["technology", "events"]</parameter>
+<parameter name="query_list">["\"Gemini\" \"latest\" \"release\""]</parameter>
+</invoke>
+</minimax:tool_call>
+```
+每个函数调用使用 `<invoke name="函数名">` 标签，参数使用 `<parameter name="参数名">` 标签包裹。
+## 手动解析函数调用结果
+### 解析函数调用
+MiniMax-M2使用结构化的 XML 标签，需要不同的解析方式。核心函数如下：
+```python
+import re
+import json
+from typing import Any, Optional, List, Dict
+def extract_name(name_str: str) -> str:
+    """从引号包裹的字符串中提取名称"""
+    name_str = name_str.strip()
+    if name_str.startswith('"') and name_str.endswith('"'):
+        return name_str[1:-1]
+    elif name_str.startswith("'") and name_str.endswith("'"):
+        return name_str[1:-1]
+    return name_str
+def convert_param_value(value: str, param_type: str) -> Any:
+    """根据参数类型转换参数值"""
+    if value.lower() == "null":
+        return None
+    param_type = param_type.lower()
+    if param_type in ["string", "str", "text"]:
+        return value
+    elif param_type in ["integer", "int"]:
+        try:
+            return int(value)
+        except (ValueError, TypeError):
+            return value
+    elif param_type in ["number", "float"]:
+        try:
+            val = float(value)
+            return val if val != int(val) else int(val)
+        except (ValueError, TypeError):
+            return value
+    elif param_type in ["boolean", "bool"]:
+        return value.lower() in ["true", "1"]
+    elif param_type in ["object", "array"]:
+        try:
+            return json.loads(value)
+        except json.JSONDecodeError:
+            return value
+    else:
+        # 尝试 JSON 解析，失败则返回字符串
+        try:
+            return json.loads(value)
+        except json.JSONDecodeError:
+            return value
+def parse_tool_calls(model_output: str, tools: Optional[List[Dict]] = None) -> List[Dict]:
+    """
+    从模型输出中提取所有工具调用
+    Args:
+        model_output: 模型的完整输出文本
+        tools: 工具定义列表，用于获取参数类型信息，格式可以是：
+               - [{"name": "...", "parameters": {...}}]
+               - [{"type": "function", "function": {"name": "...", "parameters": {...}}}]
+    Returns:
+        解析后的工具调用列表，每个元素包含 name 和 arguments 字段
+    Example:
+        >>> tools = [{
+        ...     "name": "get_weather",
+        ...     "parameters": {
+        ...         "type": "object",
+        ...         "properties": {
+        ...             "location": {"type": "string"},
+        ...             "unit": {"type": "string"}
+        ...         }
+        ...     }
+        ... }]
+        >>> output = '''<minimax:tool_call>
+        ... <invoke name="get_weather">
+        ... <parameter name="location">San Francisco</parameter>
+        ... <parameter name="unit">celsius</parameter>
+        ... </invoke>
+        ... </minimax:tool_call>'''
+        >>> result = parse_tool_calls(output, tools)
+        >>> print(result)
+        [{'name': 'get_weather', 'arguments': {'location': 'San Francisco', 'unit': 'celsius'}}]
+    """
+    # 快速检查是否包含工具调用标记
+    if "<minimax:tool_call>" not in model_output:
+        return []
+    tool_calls = []
+    try:
+        # 匹配所有 <minimax:tool_call> 块
+        tool_call_regex = re.compile(r"<minimax:tool_call>(.*?)</minimax:tool_call>", re.DOTALL)
+        invoke_regex = re.compile(r"<invoke name=(.*?)</invoke>", re.DOTALL)
+        parameter_regex = re.compile(r"<parameter name=(.*?)</parameter>", re.DOTALL)
+        # 遍历所有 tool_call 块
+        for tool_call_match in tool_call_regex.findall(model_output):
+            # 遍历该块中的所有 invoke
+            for invoke_match in invoke_regex.findall(tool_call_match):
+                # 提取函数名
+                name_match = re.search(r'^([^>]+)', invoke_match)
+                if not name_match:
+                    continue
+                function_name = extract_name(name_match.group(1))
+                # 获取参数配置
+                param_config = {}
+                if tools:
+                    for tool in tools:
+                        tool_name = tool.get("name") or tool.get("function", {}).get("name")
+                        if tool_name == function_name:
+                            params = tool.get("parameters") or tool.get("function", {}).get("parameters")
+                            if isinstance(params, dict) and "properties" in params:
+                                param_config = params["properties"]
+                            break
+                # 提取参数
+                param_dict = {}
+                for match in parameter_regex.findall(invoke_match):
+                    param_match = re.search(r'^([^>]+)>(.*)', match, re.DOTALL)
+                    if param_match:
+                        param_name = extract_name(param_match.group(1))
+                        param_value = param_match.group(2).strip()
+                        # 去除首尾的换行符
+                        if param_value.startswith('\n'):
+                            param_value = param_value[1:]
+                        if param_value.endswith('\n'):
+                            param_value = param_value[:-1]
+                        # 获取参数类型并转换
+                        param_type = "string"
+                        if param_name in param_config:
+                            if isinstance(param_config[param_name], dict) and "type" in param_config[param_name]:
+                                param_type = param_config[param_name]["type"]
+                        param_dict[param_name] = convert_param_value(param_value, param_type)
+                tool_calls.append({
+                    "name": function_name,
+                    "arguments": param_dict
+                })
+    except Exception as e:
+        print(f"解析工具调用失败: {e}")
+        return []
+    return tool_calls
+```
+**使用示例：**
+```python
+# 定义工具
+tools = [
+    {
+        "name": "get_weather",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "location": {"type": "string"},
+                "unit": {"type": "string"}
+            },
+            "required": ["location", "unit"]
+        }
+    }
+]
+# 模型输出
+model_output = """我来帮你查询天气。
+<minimax:tool_call>
+<invoke name="get_weather">
+<parameter name="location">San Francisco</parameter>
+<parameter name="unit">celsius</parameter>
+</invoke>
+</minimax:tool_call>"""
+# 解析工具调用
+tool_calls = parse_tool_calls(model_output, tools)
+# 输出结果
+for call in tool_calls:
+    print(f"调用函数: {call['name']}")
+    print(f"参数: {call['arguments']}")
+    # 输出: 调用函数: get_weather
+    #      参数: {'location': 'San Francisco', 'unit': 'celsius'}
+```
+### 执行函数调用
+解析完成后，您可以执行对应的函数并构建返回结果：
+```python
+def execute_function_call(function_name: str, arguments: dict):
+    """执行函数调用并返回结果"""
+    if function_name == "get_weather":
+        location = arguments.get("location", "未知位置")
+        unit = arguments.get("unit", "celsius")
+        # 构建函数执行结果
+        return {
+            "role": "tool",
+            "content": [
+              {
+                "name": function_name,
+                "type": "text",
+                "text": json.dumps({
+                    "location": location,
+                    "temperature": "25",
+                    "unit": unit,
+                    "weather": "晴朗"
+                }, ensure_ascii=False)
+              }
+            ]
+          }
+    elif function_name == "search_web":
+        query_list = arguments.get("query_list", [])
+        query_tag = arguments.get("query_tag", [])
+        # 模拟搜索结果
+        return {
+            "role": "tool",
+            "content": [
+              {
+                "name": function_name,
+                "type": "text",
+                "text": f"搜索关键词: {query_list}, 分类: {query_tag}\n搜索结果: 相关信息已找到"
+              }
+            ]
+          }
+    return None
+```
+### 将函数执行结果返回给模型
+成功解析函数调用后，您应将函数执行结果添加到对话历史中，以便模型在后续交互中能够访问和利用这些信息，拼接格式参考chat_template.jinja
+## 参考资料
+- [MiniMax-M2 模型仓库](https://github.com/MiniMax-AI/MiniMax-M2)
+- [vLLM 项目主页](https://github.com/vllm-project/vllm)
+- [OpenAI Python SDK](https://github.com/openai/openai-python)

docs/vllm_deploy_guide.md ADDED Viewed

	@@ -0,0 +1,88 @@

+# MiniMax M2 Model vLLM Deployment Guide
+We recommend using [vLLM](https://docs.vllm.ai/en/stable/) to deploy the [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) model. vLLM is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing vLLM's official documentation to check hardware compatibility before deployment.
+## System Requirements
+- OS: Linux
+- Python: 3.9 - 3.12
+- GPU:
+  - compute capability 7.0 or higher
+  - Memory requirements: 220 GB for weights, 60 GB per 1M context tokens
+The following are recommended configurations; actual requirements should be adjusted based on your use case:
+- 4x 96GB GPUs: Supports context input of up to 400K tokens.
+- 8x 144GB GPUs: Supports context input of up to 3M tokens.
+## Deployment with Python
+It is recommended to use a virtual environment (such as venv, conda, or uv) to avoid dependency conflicts. We recommend installing vLLM in a fresh Python environment:
+```bash
+# Not yet released, please install nightly build
+uv pip install -U vllm \
+    --torch-backend=auto \
+    --extra-index-url https://wheels.vllm.ai/nightly
+# If released, install using uv
+uv pip install "vllm" --torch-backend=auto
+```
+Run the following command to start the vLLM server. vLLM will automatically download and cache the MiniMax-M2 model from Hugging Face.
+4-GPU deployment command:
+```bash
+SAFETENSORS_FAST_GPU=1 VLLM_USE_V1=0 vllm serve \
+    --model MiniMaxAI/MiniMax-M2 \
+    --trust-remote-code \
+    --enable-expert-parallel --tensor-parallel-size 4 \
+    --enable-auto-tool-choice --tool-call-parser minimax_m2 \
+    --reasoning-parser minimax_m2
+```
+## Testing Deployment
+After startup, you can test the vLLM OpenAI-compatible API with the following command:
+```bash
+curl http://localhost:8000/v1/chat/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "MiniMaxAI/MiniMax-M2",
+        "messages": [
+            {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
+            {"role": "user", "content": [{"type": "text", "text": "Who won the world series in 2020?"}]}
+        ]
+    }'
+```
+## Common Issues
+### Hugging Face Network Issues
+If you encounter network issues, you can set up a proxy before pulling the model.
+```bash
+export HF_ENDPOINT=https://hf-mirror.com
+```
+### MiniMax-M2 model is not currently supported
+This vLLM version is outdated. Please upgrade to the latest version.
+## Getting Support
+If you encounter any issues while deploying the MiniMax model:
+- Contact our technical support team through official channels such as email at api@minimaxi.com
+- Submit an issue on our [GitHub](https://github.com/MiniMax-AI) repository
+We continuously optimize the deployment experience for our models. Feedback is welcome!

docs/vllm_deploy_guide_cn.md ADDED Viewed

	@@ -0,0 +1,85 @@

+# MiniMax M2 模型 vLLM 部署指南
+我们推荐使用 [vLLM](https://docs.vllm.ai/en/stable/) 来部署 [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) 模型。vLLM 是一个高性能的推理引擎，其具有卓越的服务吞吐、高效智能的内存管理机制、强大的批量请求处理能力、深度优化的底层性能等特性。我们建议在部署之前查看 vLLM 的官方文档以检查硬件兼容性。
+## 环境要求
+- OS：Linux
+- Python：3.9 - 3.12
+- GPU：
+  - compute capability 7.0 or higher
+  - 显存需求：权重需要 220 GB，每 1M 上下文 token 需要 60 GB
+以下为推荐配置，实际需求请根据业务场景调整：
+- 96G x4 GPU：支持 40 万 token 的上下文输入。
+- 144G x8 GPU：支持长达 300 万 token 的上下文输入。
+## 使用 Python 部署
+建议使用虚拟环境（如 venv、conda、uv）以避免依赖冲突。建议在全新的 Python 环境中安装 vLLM：
+```bash
+# 尚未 release，请安装 nightly 构建
+uv pip install -U vllm \
+    --torch-backend=auto \
+    --extra-index-url https://wheels.vllm.ai/nightly
+# 如果 release，使用 uv 安装
+uv pip install "vllm" --torch-backend=auto
+```
+运行如下命令启动 vLLM 服务器，vLLM 会自动从 Huggingface 下载并缓存 MiniMax-M2 模型。
+4 卡部署命令：
+```bash
+SAFETENSORS_FAST_GPU=1 VLLM_USE_V1=0 vllm serve \
+    --model MiniMaxAI/MiniMax-M2 \
+    --trust-remote-code \
+    --enable-expert-parallel --tensor-parallel-size 4 \
+    --enable-auto-tool-choice --tool-call-parser minimax_m2 \
+    --reasoning-parser minimax_m2
+```
+## 测试部署
+启动后，可以通过如下命令测试 vLLM OpenAI 兼容接口：
+```bash
+curl http://localhost:8000/v1/chat/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "MiniMaxAI/MiniMax-M2",
+        "messages": [
+            {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
+            {"role": "user", "content": [{"type": "text", "text": "Who won the world series in 2020?"}]}
+        ]
+    }'
+```
+## 常见问题
+### Huggingface 网络问题
+如果遇到网络问题，可以设置代理后再进行拉取。
+```bash
+export HF_ENDPOINT=https://hf-mirror.com
+```
+### MiniMax-M2 model is not currently supported
+该 vLLM 版本过旧，请升级到最新版本。
+## 获取支持
+如果在部署 MiniMax 模型过程中遇到任何问题：
+- 通过邮箱 api@minimaxi.com 等官方渠道联系我们的技术支持团队
+- 在我们的 [GitHub](https://github.com/MiniMax-AI) 仓库提交 Issue
+我们会持续优化模型的部署体验，欢迎反馈！

figures/Bench.png ADDED Viewed

Git LFS Details

SHA256: 5ca5e3f1bc81738c76de3e7b86f5e329334ef38d74690dfb73e046fe13324322
Pointer size: 131 Bytes
Size of remote file: 162 kB

generation_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9ac66c5295cc65e6b9ace00be407b7281c24533db9d96dcd098afa2af7361ee8
+size 114

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9734599c8e3e00dbb5612bfe264edae70e2f994696b54ad57f70a1faf7beb251
+size 14057058

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:757622126525aeeb131756849d93298070ff3f0319c455ec8c5bb0f6b1cebbe8
+size 9730160

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60e392fea8d47660584deb3c6a716aab7b9f11fed24bc494baf5eb7ba809e882
+size 10893

vocab.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b44c066b5dc34c800c4e3ecbd85f3e95ce3bfdbf8a5fe30223e005175103578a
+size 4705413