Update README.md
Browse files
README.md
CHANGED
|
@@ -35,6 +35,14 @@ We use the following serving configurations:
|
|
| 35 |
|
| 36 |
The provided chat template sets the reasoning effort to `high`
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
### Transformers
|
| 39 |
You can use `K2 Think V2` with Transformers. If you use `transformers.pipeline`, it will apply the chat template automatically. If you use `model.generate` directly, you need to apply the chat template mannually.
|
| 40 |
|
|
|
|
| 35 |
|
| 36 |
The provided chat template sets the reasoning effort to `high`
|
| 37 |
|
| 38 |
+
### Serving commands
|
| 39 |
+
|
| 40 |
+
To serve on VLLM:
|
| 41 |
+
|
| 42 |
+
```
|
| 43 |
+
vllm serve LLM360/K2-Think-V2 --tensor-parallel-size 8 --port 8000
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
### Transformers
|
| 47 |
You can use `K2 Think V2` with Transformers. If you use `transformers.pipeline`, it will apply the chat template automatically. If you use `model.generate` directly, you need to apply the chat template mannually.
|
| 48 |
|