JunHowie
/

Qwen3-0.6B-GPTQ-Int4

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

JunHowie commited on Sep 3, 2025

Commit

b9d8700

·

verified ·

1 Parent(s): af0b13a

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -17,7 +17,8 @@ base_model_relation: quantized
 Base model: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
 <i>This model is quantized to 4-bit with a group size of 128.</i>
-<i>Compared to earlier quantized versions, the new quantized model achieves better performance in tokens/s efficiency.</i>
 ```
 vllm serve JunHowie/Qwen3-0.6B-GPTQ-Int4

 Base model: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
 <i>This model is quantized to 4-bit with a group size of 128.</i>
+<br>
+<i>Compared to earlier quantized versions, the new quantized model demonstrates better tokens/s efficiency. This improvement comes from setting desc_act=False in the quantization configuration.</i>
 ```
 vllm serve JunHowie/Qwen3-0.6B-GPTQ-Int4