matthausk's picture
Upload folder using huggingface_hub
c0bd6c9 verified
---
library_name: transformers
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
---
# Qwen3-Coder-30B-A3B-Instruct_MXFP4
This checkpoint is a variant of [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct), where expert weights have been quantized to [MXFP4 format](https://huggingface.co/blog/faster-transformers#what-is-mxfp4) similarly to [gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) and [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b).
For quantizing weights we used the function `downcast_to_mxfp` from [triton-kernels](https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/numerics_details/mxfp.py).
The checkpoint might come with a small drop in accuracy, but has **~68% size reduction** compared to the original BF16 checkpoint.
## Accuracy Comparison
| Model | GSM8K (strict-match) | GSM8K (flexible-extract) |
|-------|---------------------|--------------------------|
| **Qwen3-Coder-30B-A3B-Instruct (BF16)** | 90.67% ± 0.80% | 89.92% ± 0.83% |
| **Qwen3-Coder-30B-A3B-Instruct_MXFP4** | 89.76% ± 0.83% | 88.70% ± 0.87% |
## Checkpoint Size
| Model | Size | Reduction |
|-------|------|-----------|
| **Qwen3-Coder-30B-A3B-Instruct (BF16)** | 57 GB | - |
| **Qwen3-Coder-30B-A3B-Instruct_MXFP4** | 18 GB | **68% smaller** |