Model Card for Qwen3-Coder-Next-3bit-g128
Quantized Qwen/Qwen3-Coder-Next using mlx-lm to 3-bit with group_size 128 for main weights and fine-grained group_size 64 for MoE weights, with the aim of maximum accuracy for 3bit quantization.
Evaluation Results
Testing with mlx_lm.evaluate using mmlu_pro with 50 Qs for each topic, comparing the 3-bit g128 quant with the 3-bit g64 quant:
| Domain | g64 | g128 | Improvement |
|---|---|---|---|
| Math | 0.94 | 0.90 | -4% |
| Computer Science | 0.82 | 0.84 | +2% |
| Engineering | 0.70 | 0.64 | -6% |
| Physics | 0.94 | 0.92 | -2% |
| Chemistry | 0.86 | 0.88 | +2% |
Average Performance
- 3-bit g64/default: 85.2% average
- 3-bit g128: 83.6% average
- Difference: -1.6 percentage points with group size 128 (main weights) with fine-grained 64 (MoE weights)
Key Benefits
- The g128 group_size applied to main weights (with fine-grained g64 for MoE weights) provides better accuracy than g64 group_size for Computer Science and Chemistry domains (+2% each)
- Lower memory footprint compared to higher bit quantizations (~38-45 GB)
- Consistent performance in Math and Physics domains (90-94%)
Model Details
- Base Model: Qwen/Qwen3-Coder-Next
- Library: mlx-lm
- Quantization: 3-bit with group_size 128 for main weights and group_size 64 for MoE weights
- License: apache-2.0
- Pipeline Tag: text-generation
Full Quantization Spectrum Comparison
| Domain | 3-bit default | 3-bit g128 | 4-bit default | 4-bit g128 | 6-bit default | 6-bit g128 | 8-bit default | 8-bit g128 |
|---|---|---|---|---|---|---|---|---|
| Math | 0.94 | 0.90 | 0.92 | 0.92 | 0.90 | 0.94 | 0.92 | 0.94 |
| Computer Science | 0.82 | 0.84 | 0.80 | 0.84 | 0.82 | 0.86 | 0.84 | 0.90 |
| Engineering | 0.70 | 0.64 | 0.70 | 0.76 | 0.74 | 0.72 | 0.80 | 0.80 |
| Physics | 0.94 | 0.92 | 0.94 | 0.96 | 0.96 | 0.94 | 0.96 | 0.96 |
| Chemistry | 0.86 | 0.88 | 0.90 | 0.90 | 0.94 | 0.92 | 0.90 | 0.94 |
| Average | 0.852 | 0.836 | 0.835 | 0.865 | 0.872 | 0.876 | 0.868 | 0.888 |
Key Observations
Quantization Level Impact
- 3-bit:
- g64/default: 85.2% average
- g128: 83.6% average (-1.6% vs g64/default)
- 4-bit:
- g64/default: 83.5% average
- g128: 86.5% average (+3.0% vs g64/default)
- 6-bit:
- g64/default: 87.2% average
- g128: 87.6% average (+0.4% vs g64/default)
- 8-bit:
- g64/default: 86.8% average
- g128: 88.8% average (+3.6% vs g64/default)
Group Size Impact Across Quantization Levels
| Quantization | g128 vs baseline | Performance Trend |
|---|---|---|
| 3-bit | -1.6% | g128 underperforms default |
| 4-bit | +3.0% | g128 significantly outperforms default |
| 6-bit | +0.4% | g128 slightly outperforms default |
| 8-bit | +3.6% | g128 significantly outperforms default |
Usage
import mlx_lm
from mlx_lm.sample_utils import make_sampler
model_path = "petergilani/qwen3-coder-next-3bit-g128"
model, tokenizer = mlx_lm.load(model_path)
sampler = make_sampler(temp=1.0, top_p=0.95, top_k=40)
prompt = "Write a Python function to calculate the factorial of a number."
response = mlx_lm.generate(
model,
tokenizer,
prompt=prompt,
sampler=sampler,
max_tokens=512
)
print(response)
- Downloads last month
- 141
Model tree for petergilani/Qwen3-Coder-Next-3bit-g128
Base model
Qwen/Qwen3-Coder-Next