Model Card for petergilani/Qwen3-Coder-Next-4bit-g128

Quantized Qwen/Qwen3-Coder-Next using mlx-lm to 4-bit with group_size 128 for main weights, with the aim of maximum efficiency for 4-bit quantization.

Updated Evaluation Results (February 13, 2026)

Comprehensive evaluation results from thorough testing using mlx_lm.evaluate with mmlu_pro (200 questions per domain, num_shots=1, temp=1.0, top_p=0.95, top_k=40, seed=123):

Direct Comparison Summary (4-bit g64 vs g128)

Domain	4-bit g64	4-bit g128 (this model)	Difference
Computer Science	85.5%	85.5%	0.0%
Math	92.0%	91.5%	-0.5%
Physics	88.0%	91.0%	+3.0%
Engineering	69.0%	71.0%	+2.0%
Chemistry	86.0%	86.0%	0.0%

Average Performance

4-bit g64: 84.1% average
4-bit g128 (this model): 84.2% average
Difference: 0.1 percentage points (virtually identical)

Key Observations

Both 4-bit models show nearly identical average performance (84.1-84.2%)
Performance differences are domain-specific rather than model-specific
4-bit g128 (this model) performs better in Physics and Engineering
4-bit g64 performs slightly better in Math and Chemistry

Memory Usage Comparison

4-bit g64:
- Peak memory usage: 56.355 GB (Math) to 61.010 GB (Physics)
- Average memory usage: ~60.0 GB
4-bit g128 (this model):
- Peak memory usage: 51.989 GB (Math) to 59.084 GB (Engineering)
- Average memory usage: ~57.3 GB

Full Quantization Spectrum Comparison (Updated)

Domain	3-bit default	3-bit g128	4-bit g64	4-bit g128 (this model)	6-bit default	6-bit g128	8-bit g64	8-bit g128
Math	0.94	0.90	0.920	0.915	0.90	0.94	0.930	0.925
Computer Science	0.82	0.84	0.855	0.855	0.82	0.86	0.850	0.875
Engineering	0.70	0.64	0.690	0.710	0.74	0.72	0.710	0.685
Physics	0.94	0.92	0.880	0.910	0.96	0.94	0.920	0.885
Chemistry	0.86	0.88	0.860	0.860	0.94	0.92	0.875	0.885
Average	0.852	0.836	0.841	0.842	0.872	0.876	0.870	0.870

Original Evaluation Results

Testing with mlx_lm.evaluate using mmlu_pro with 50 Qs for each topic, comparing the 4-bit g128 quant with the 4-bit g64 quant:

Domain	g64	g128	Improvement
Math	0.92	0.92	0%
Computer Science	0.84	0.84	0%
Engineering	0.70	0.76	+6%
Physics	0.94	0.96	+2%
Chemistry	0.90	0.90	0%

Average Performance

4-bit g64: 84.4% average
4-bit g128: 86.0% average
Improvement: +1.6 percentage points with group size 128

Key Benefits

The g128 group_size applied to main weights provides modest memory savings (~2.7 GB less on average)
Performance differences between group sizes are minimal for 4-bit quantization
4-bit quantization provides significant memory efficiency (~40% less than 8-bit models)
Thermal characteristics are highly favorable compared to higher-bit quantizations, making this suitable for thermal-constrained environments

Model Details

Base Model: Qwen/Qwen3-Coder-Next
Library: mlx-lm
Quantization: 4-bit with group_size 128 for main weights
License: apache-2.0
Pipeline Tag: text-generation

Usage

import mlx_lm
from mlx_lm.sample_utils import make_sampler

model_path = "petergilani/Qwen3-Coder-Next-4bit"
model, tokenizer = mlx_lm.load(model_path)

sampler = make_sampler(temp=1.0, top_p=0.95, top_k=40)

prompt = "Write a Python function to calculate the factorial of a number."
response = mlx_lm.generate(
    model, 
    tokenizer, 
    prompt=prompt,
    sampler=sampler,
    max_tokens=512
)
print(response)

Downloads last month: 4

Safetensors

Model size

80B params

Tensor type

BF16

U32

Model tree for petergilani/Qwen3-Coder-Next-4bit-g128

Base model

Qwen/Qwen3-Coder-Next

Quantized

(101)

this model