update vllm config
Browse files
README.md
CHANGED
|
@@ -53,7 +53,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 53 |
Alternatively, you may serve the model using VLLM:
|
| 54 |
|
| 55 |
```
|
| 56 |
-
vllm serve LLM360/K2-V2-Instruct --tensor-parallel-size 8 --port 8000
|
| 57 |
```
|
| 58 |
|
| 59 |
K2-V2-Instruct uses `reasoning_effort="low"|"medium"|"high"` in the chat template to determine reasoning effort. If you cannot use `tokenizer.apply_chat_template`, you may also pass in these arguments using `extra_body` and `chat_template_kwargs`:
|
|
|
|
| 53 |
Alternatively, you may serve the model using VLLM:
|
| 54 |
|
| 55 |
```
|
| 56 |
+
vllm serve LLM360/K2-V2-Instruct --tensor-parallel-size 8 --port 8000
|
| 57 |
```
|
| 58 |
|
| 59 |
K2-V2-Instruct uses `reasoning_effort="low"|"medium"|"high"` in the chat template to determine reasoning effort. If you cannot use `tokenizer.apply_chat_template`, you may also pass in these arguments using `extra_body` and `chat_template_kwargs`:
|