btbtyler09 commited on
Commit
3f4b79c
·
verified ·
1 Parent(s): f6c16a6

Update README.md

Browse files

tweaking the serve command.

Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -95,16 +95,15 @@ For vLLM tool calling:
95
  vllm serve btbtyler09/Qwen3-Coder-Next-GPTQ-4bit \
96
  --tensor-parallel-size 4 \
97
  --trust-remote-code \
98
- --quantization gptq \
99
  --enable-auto-tool-choice \
100
- --tool-call-parser hermes
101
  ```
102
 
103
  ## Credits
104
 
105
  - **Base Model**: [Qwen](https://huggingface.co/Qwen) - Qwen3-Coder-Next
106
  - **Quantization**: GPTQ via [GPTQModel](https://github.com/modelcloud/gptqmodel) v5.7.0
107
- - **Expert Converter**: Custom `convert_qwen3next_expert_converter` for fused 3D expert weights
108
  - **Quantized by**: [btbtyler09](https://huggingface.co/btbtyler09)
109
 
110
  ## License
 
95
  vllm serve btbtyler09/Qwen3-Coder-Next-GPTQ-4bit \
96
  --tensor-parallel-size 4 \
97
  --trust-remote-code \
98
+ --dtype float16 \
99
  --enable-auto-tool-choice \
100
+ --tool-call-parser qwen3_coder
101
  ```
102
 
103
  ## Credits
104
 
105
  - **Base Model**: [Qwen](https://huggingface.co/Qwen) - Qwen3-Coder-Next
106
  - **Quantization**: GPTQ via [GPTQModel](https://github.com/modelcloud/gptqmodel) v5.7.0
 
107
  - **Quantized by**: [btbtyler09](https://huggingface.co/btbtyler09)
108
 
109
  ## License