lmlmcat commited on
Commit
79055e5
·
verified ·
1 Parent(s): 754eb62

update vllm config

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -53,7 +53,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
53
  Alternatively, you may serve the model using VLLM:
54
 
55
  ```
56
- vllm serve LLM360/K2-V2-Instruct --tensor-parallel-size 8 --port 8000 --revision "sft_final"
57
  ```
58
 
59
  K2-V2-Instruct uses `reasoning_effort="low"|"medium"|"high"` in the chat template to determine reasoning effort. If you cannot use `tokenizer.apply_chat_template`, you may also pass in these arguments using `extra_body` and `chat_template_kwargs`:
 
53
  Alternatively, you may serve the model using VLLM:
54
 
55
  ```
56
+ vllm serve LLM360/K2-V2-Instruct --tensor-parallel-size 8 --port 8000
57
  ```
58
 
59
  K2-V2-Instruct uses `reasoning_effort="low"|"medium"|"high"` in the chat template to determine reasoning effort. If you cannot use `tokenizer.apply_chat_template`, you may also pass in these arguments using `extra_body` and `chat_template_kwargs`: