Motif-Technologies
/

Motif-2-12.7B-Reasoning

Text Generation

text-generation-inference

Model card Files Files and versions

leejunhyeok commited on Dec 12, 2025

Commit

5197888

·

verified ·

1 Parent(s): bc61b50

Update README.md

Files changed (1) hide show

README.md +59 -1

README.md CHANGED Viewed

@@ -43,4 +43,62 @@ This is a reasoning enhanced version of **Motif-2-12.7B-Instruct**. Detailed inf
 |LiveCodeBench v5 <br> (2024.10 - 2025.2)|-|50.03|65|
 |LiveCodeBench v5 |0-shot, CoT|61.66|60.1|
 |HumanEval|0-shot|93.2|93.2|
-|**Average**|-|**75.45**|**79.71**|

 |LiveCodeBench v5 <br> (2024.10 - 2025.2)|-|50.03|65|
 |LiveCodeBench v5 |0-shot, CoT|61.66|60.1|
 |HumanEval|0-shot|93.2|93.2|
+|**Average**|-|**75.45**|**79.71**|
+## How to use in vllm
+The [PR](https://github.com/vllm-project/vllm/pull/27396) adding support for the Motif model in the official vLLM package is currently under review.
+In the meantime, to use our model with vLLM, please use the following container [image](https://github.com/motiftechnologies/vllm/pkgs/container/vllm).
+Our model supports a sequence length of up to 64K tokens.
+```bash
+# run vllm api server
+VLLM_ATTENTION_BACKEND=DIFFERENTIAL_FLASH_ATTN \
+vllm serve Motif-Technologies/Motif-2-12.7B-Reasoning \
+    --trust-remote-code \
+    --max-model-len 65536 \
+    --tensor-parallel-size 8
+# sending requests with curl
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [
+      {"role": "system", "content": "You are a helpful assistant."},
+      {"role": "user", "content": "What is the capital city of South Korea?"}
+    ],
+    "temperature": 0.6,
+    "skip_special_tokens": false,
+    "chat_template_kwargs": {
+        "enable_thinking": true
+    }
+  }'
+```
+## How to use advanced vllm options
+Description of each option
+```--compilation_config '{"full_cuda_graph": true}'``` : Activates cuda [full graph capture](https://docs.vllm.ai/en/stable/design/cuda_graphs/#cudagraphmodes)
+```--rope-scaling '{"rope_type":"yarn","factor":2.0,"original_max_position_embeddings":65536}'```: Apply [yarn](https://arxiv.org/abs/2309.00071)
+```--enable-auto-tool-choice --tool-call-parser hermes``` : Enables [tool calling](https://docs.vllm.ai/en/latest/features/tool_calling/)
+```--logits-processors logit_:WrappedPerReqLogitsProcessor``` : Enables VLLM_THINK_BUDGET_RATIO env variable and repetition-based auto-stop for thinking autostop
+```--reasoning-parser deepseek_r1``` : Parses [reasoning outputs](https://docs.vllm.ai/en/latest/features/reasoning_outputs/)
+### how to use
+```bash
+pip install -U "huggingface_hub[cli]"
+hf download Motif-Technologies/Motif-2-12.7B-Reasoning \
+  --include "logit_processors/*" \
+  --local-dir ./
+export PYTHONPATH="$PWD/logit_processors"
+VLLM_ATTENTION_BACKEND=DIFFERENTIAL_FLASH_ATTN \
+VLLM_THINK_BUDGET_RATIO=0.95 \
+vllm serve Motif-Technologies/Motif-2-12.7B-Reasoning \
+    --trust-remote-code \
+    --compilation_config '{"full_cuda_graph": true}' \
+    --rope-scaling '{"rope_type":"yarn","factor":2.0,"original_max_position_embeddings":65536}' \
+    --max-model-len 131072 \
+    --tensor-parallel-size 8 \
+    --enable-auto-tool-choice \
+    --tool-call-parser hermes \
+    --logits-processors logit_:WrappedPerReqLogitsProcessor \
+    --reasoning-parser deepseek_r1
+```