Qwen3-235B-A22B-Instruct-2507_MXFP4
This checkpoint is a variant of Qwen3-235B-A22B-Instruct-2507, where expert weights have been quantized to MXFP4 format similarly to gpt-oss-20b and gpt-oss-120b.
For quantizing weights we used the function downcast_to_mxfp from triton-kernels.
The checkpoint might come with a small drop in accuracy, but has ~71% size reduction compared to the original BF16 checkpoint.
Accuracy Comparison
| Model | GSM8K (strict-match) | GSM8K (flexible-extract) |
|---|---|---|
| Qwen3-235B-A22B-Instruct-2507 (BF16) | 90.14% ± 0.82% | 91.05% ± 0.79% |
| Qwen3-235B-A22B-Instruct-2507_MXFP4 | 90.45% ± 0.81% | 91.36% ± 0.77% |
Checkpoint Size
| Model | Size | Reduction |
|---|---|---|
| Qwen3-235B-A22B-Instruct-2507 (BF16) | 438 GB | - |
| Qwen3-235B-A22B-Instruct-2507_MXFP4 | 128 GB | 71% smaller |
- Downloads last month
- 4