leejunhyeok commited on
Commit
5197888
·
verified ·
1 Parent(s): bc61b50

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -1
README.md CHANGED
@@ -43,4 +43,62 @@ This is a reasoning enhanced version of **Motif-2-12.7B-Instruct**. Detailed inf
43
  |LiveCodeBench v5 <br> (2024.10 - 2025.2)|-|50.03|65|
44
  |LiveCodeBench v5 |0-shot, CoT|61.66|60.1|
45
  |HumanEval|0-shot|93.2|93.2|
46
- |**Average**|-|**75.45**|**79.71**|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  |LiveCodeBench v5 <br> (2024.10 - 2025.2)|-|50.03|65|
44
  |LiveCodeBench v5 |0-shot, CoT|61.66|60.1|
45
  |HumanEval|0-shot|93.2|93.2|
46
+ |**Average**|-|**75.45**|**79.71**|
47
+
48
+ ## How to use in vllm
49
+ The [PR](https://github.com/vllm-project/vllm/pull/27396) adding support for the Motif model in the official vLLM package is currently under review.
50
+ In the meantime, to use our model with vLLM, please use the following container [image](https://github.com/motiftechnologies/vllm/pkgs/container/vllm).
51
+ Our model supports a sequence length of up to 64K tokens.
52
+ ```bash
53
+ # run vllm api server
54
+ VLLM_ATTENTION_BACKEND=DIFFERENTIAL_FLASH_ATTN \
55
+ vllm serve Motif-Technologies/Motif-2-12.7B-Reasoning \
56
+ --trust-remote-code \
57
+ --max-model-len 65536 \
58
+ --tensor-parallel-size 8
59
+
60
+ # sending requests with curl
61
+ curl http://localhost:8000/v1/chat/completions \
62
+ -H "Content-Type: application/json" \
63
+ -d '{
64
+ "messages": [
65
+ {"role": "system", "content": "You are a helpful assistant."},
66
+ {"role": "user", "content": "What is the capital city of South Korea?"}
67
+ ],
68
+ "temperature": 0.6,
69
+ "skip_special_tokens": false,
70
+ "chat_template_kwargs": {
71
+ "enable_thinking": true
72
+ }
73
+ }'
74
+ ```
75
+
76
+ ## How to use advanced vllm options
77
+ Description of each option
78
+ ```--compilation_config '{"full_cuda_graph": true}'``` : Activates cuda [full graph capture](https://docs.vllm.ai/en/stable/design/cuda_graphs/#cudagraphmodes)
79
+ ```--rope-scaling '{"rope_type":"yarn","factor":2.0,"original_max_position_embeddings":65536}'```: Apply [yarn](https://arxiv.org/abs/2309.00071)
80
+ ```--enable-auto-tool-choice --tool-call-parser hermes``` : Enables [tool calling](https://docs.vllm.ai/en/latest/features/tool_calling/)
81
+ ```--logits-processors logit_:WrappedPerReqLogitsProcessor``` : Enables VLLM_THINK_BUDGET_RATIO env variable and repetition-based auto-stop for thinking autostop
82
+ ```--reasoning-parser deepseek_r1``` : Parses [reasoning outputs](https://docs.vllm.ai/en/latest/features/reasoning_outputs/)
83
+
84
+ ### how to use
85
+ ```bash
86
+ pip install -U "huggingface_hub[cli]"
87
+ hf download Motif-Technologies/Motif-2-12.7B-Reasoning \
88
+ --include "logit_processors/*" \
89
+ --local-dir ./
90
+
91
+ export PYTHONPATH="$PWD/logit_processors"
92
+ VLLM_ATTENTION_BACKEND=DIFFERENTIAL_FLASH_ATTN \
93
+ VLLM_THINK_BUDGET_RATIO=0.95 \
94
+ vllm serve Motif-Technologies/Motif-2-12.7B-Reasoning \
95
+ --trust-remote-code \
96
+ --compilation_config '{"full_cuda_graph": true}' \
97
+ --rope-scaling '{"rope_type":"yarn","factor":2.0,"original_max_position_embeddings":65536}' \
98
+ --max-model-len 131072 \
99
+ --tensor-parallel-size 8 \
100
+ --enable-auto-tool-choice \
101
+ --tool-call-parser hermes \
102
+ --logits-processors logit_:WrappedPerReqLogitsProcessor \
103
+ --reasoning-parser deepseek_r1
104
+ ```