matthausk's picture
Upload folder using huggingface_hub
d7be820 verified
---
library_name: transformers
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
---
# Qwen3-Coder-480B-A35B-Instruct_MXFP4
This checkpoint is a variant of [Qwen3-Coder-480B-A35B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct), where expert weights have been quantized to [MXFP4 format](https://huggingface.co/blog/faster-transformers#what-is-mxfp4) similarly to [gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) and [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b).
For quantizing weights we used the function `downcast_to_mxfp` from [triton-kernels](https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/numerics_details/mxfp.py).
The checkpoint might come with a small drop in accuracy, but has **~72% size reduction** compared to the original BF16 checkpoint.
## Accuracy Comparison
| Model | GSM8K (strict-match) | GSM8K (flexible-extract) |
|-------|---------------------|--------------------------|
| **Qwen3-Coder-480B-A35B-Instruct (BF16)** | 89.16% ± 0.86% | 90.52% ± 0.81% |
| **Qwen3-Coder-480B-A35B-Instruct_MXFP4** | 89.99% ± 0.83% | 90.75% ± 0.80% |
## Checkpoint Size
| Model | Size | Reduction |
|-------|------|-----------|
| **Qwen3-Coder-480B-A35B-Instruct (BF16)** | 895 GB | - |
| **Qwen3-Coder-480B-A35B-Instruct_MXFP4** | 255 GB | **72% smaller** |