Update README.md
Browse filestweaking the serve command.
README.md
CHANGED
|
@@ -95,16 +95,15 @@ For vLLM tool calling:
|
|
| 95 |
vllm serve btbtyler09/Qwen3-Coder-Next-GPTQ-4bit \
|
| 96 |
--tensor-parallel-size 4 \
|
| 97 |
--trust-remote-code \
|
| 98 |
-
--
|
| 99 |
--enable-auto-tool-choice \
|
| 100 |
-
--tool-call-parser
|
| 101 |
```
|
| 102 |
|
| 103 |
## Credits
|
| 104 |
|
| 105 |
- **Base Model**: [Qwen](https://huggingface.co/Qwen) - Qwen3-Coder-Next
|
| 106 |
- **Quantization**: GPTQ via [GPTQModel](https://github.com/modelcloud/gptqmodel) v5.7.0
|
| 107 |
-
- **Expert Converter**: Custom `convert_qwen3next_expert_converter` for fused 3D expert weights
|
| 108 |
- **Quantized by**: [btbtyler09](https://huggingface.co/btbtyler09)
|
| 109 |
|
| 110 |
## License
|
|
|
|
| 95 |
vllm serve btbtyler09/Qwen3-Coder-Next-GPTQ-4bit \
|
| 96 |
--tensor-parallel-size 4 \
|
| 97 |
--trust-remote-code \
|
| 98 |
+
--dtype float16 \
|
| 99 |
--enable-auto-tool-choice \
|
| 100 |
+
--tool-call-parser qwen3_coder
|
| 101 |
```
|
| 102 |
|
| 103 |
## Credits
|
| 104 |
|
| 105 |
- **Base Model**: [Qwen](https://huggingface.co/Qwen) - Qwen3-Coder-Next
|
| 106 |
- **Quantization**: GPTQ via [GPTQModel](https://github.com/modelcloud/gptqmodel) v5.7.0
|
|
|
|
| 107 |
- **Quantized by**: [btbtyler09](https://huggingface.co/btbtyler09)
|
| 108 |
|
| 109 |
## License
|