btbtyler09
/

Qwen3-Coder-Next-GPTQ-4bit

Text Generation

Mixture of Experts

4-bit precision

Model card Files Files and versions

btbtyler09 commited on 18 days ago

Commit

3f4b79c

·

verified ·

1 Parent(s): f6c16a6

Update README.md

tweaking the serve command.

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -95,16 +95,15 @@ For vLLM tool calling:
 vllm serve btbtyler09/Qwen3-Coder-Next-GPTQ-4bit \
   --tensor-parallel-size 4 \
   --trust-remote-code \
-  --quantization gptq \
   --enable-auto-tool-choice \
-  --tool-call-parser hermes
 ```
 ## Credits
 - **Base Model**: [Qwen](https://huggingface.co/Qwen) - Qwen3-Coder-Next
 - **Quantization**: GPTQ via [GPTQModel](https://github.com/modelcloud/gptqmodel) v5.7.0
-- **Expert Converter**: Custom `convert_qwen3next_expert_converter` for fused 3D expert weights
 - **Quantized by**: [btbtyler09](https://huggingface.co/btbtyler09)
 ## License

 vllm serve btbtyler09/Qwen3-Coder-Next-GPTQ-4bit \
   --tensor-parallel-size 4 \
   --trust-remote-code \
+  --dtype float16 \
   --enable-auto-tool-choice \
+  --tool-call-parser qwen3_coder
 ```
 ## Credits
 - **Base Model**: [Qwen](https://huggingface.co/Qwen) - Qwen3-Coder-Next
 - **Quantization**: GPTQ via [GPTQModel](https://github.com/modelcloud/gptqmodel) v5.7.0
 - **Quantized by**: [btbtyler09](https://huggingface.co/btbtyler09)
 ## License