LLM360
/

K2-V2-Instruct

Model card Files Files and versions

lmlmcat commited on Dec 5, 2025

Commit

79055e5

·

verified ·

1 Parent(s): 754eb62

update vllm config

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -53,7 +53,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 Alternatively, you may serve the model using VLLM:
 ```
-vllm serve LLM360/K2-V2-Instruct --tensor-parallel-size 8 --port 8000 --revision "sft_final"
 ```
 K2-V2-Instruct uses `reasoning_effort="low"|"medium"|"high"` in the chat template to determine reasoning effort. If you cannot use `tokenizer.apply_chat_template`, you may also pass in these arguments using `extra_body` and `chat_template_kwargs`:

 Alternatively, you may serve the model using VLLM:
 ```
+vllm serve LLM360/K2-V2-Instruct --tensor-parallel-size 8 --port 8000
 ```
 K2-V2-Instruct uses `reasoning_effort="low"|"medium"|"high"` in the chat template to determine reasoning effort. If you cannot use `tokenizer.apply_chat_template`, you may also pass in these arguments using `extra_body` and `chat_template_kwargs`: