richardmfan commited on
Commit
91d39e0
·
verified ·
1 Parent(s): 258698f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -35,6 +35,14 @@ We use the following serving configurations:
35
 
36
  The provided chat template sets the reasoning effort to `high`
37
 
 
 
 
 
 
 
 
 
38
  ### Transformers
39
  You can use `K2 Think V2` with Transformers. If you use `transformers.pipeline`, it will apply the chat template automatically. If you use `model.generate` directly, you need to apply the chat template mannually.
40
 
 
35
 
36
  The provided chat template sets the reasoning effort to `high`
37
 
38
+ ### Serving commands
39
+
40
+ To serve on VLLM:
41
+
42
+ ```
43
+ vllm serve LLM360/K2-Think-V2 --tensor-parallel-size 8 --port 8000
44
+ ```
45
+
46
  ### Transformers
47
  You can use `K2 Think V2` with Transformers. If you use `transformers.pipeline`, it will apply the chat template automatically. If you use `model.generate` directly, you need to apply the chat template mannually.
48