aws-prototyping
/

Qwen3-Coder-480B-A35B-Instruct_MXFP4

Text Generation

8-bit precision

Model card Files Files and versions

Qwen3-Coder-480B-A35B-Instruct_MXFP4 / README.md

matthausk's picture

Upload folder using huggingface_hub

d7be820 verified about 1 month ago

|

history blame contribute delete

1.45 kB

	---
	library_name: transformers
	license: apache-2.0
	license_link: https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct/blob/main/LICENSE
	pipeline_tag: text-generation
	---

	# Qwen3-Coder-480B-A35B-Instruct_MXFP4

	This checkpoint is a variant of [Qwen3-Coder-480B-A35B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct), where expert weights have been quantized to [MXFP4 format](https://huggingface.co/blog/faster-transformers#what-is-mxfp4) similarly to [gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) and [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b).

	For quantizing weights we used the function `downcast_to_mxfp` from [triton-kernels](https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/numerics_details/mxfp.py).

	The checkpoint might come with a small drop in accuracy, but has ~72% size reduction compared to the original BF16 checkpoint.

	## Accuracy Comparison

	\| Model \| GSM8K (strict-match) \| GSM8K (flexible-extract) \|
	\|-------\|---------------------\|--------------------------\|
	\| Qwen3-Coder-480B-A35B-Instruct (BF16) \| 89.16% ± 0.86% \| 90.52% ± 0.81% \|
	\| Qwen3-Coder-480B-A35B-Instruct_MXFP4 \| 89.99% ± 0.83% \| 90.75% ± 0.80% \|

	## Checkpoint Size

	\| Model \| Size \| Reduction \|
	\|-------\|------\|-----------\|
	\| Qwen3-Coder-480B-A35B-Instruct (BF16) \| 895 GB \| - \|
	\| Qwen3-Coder-480B-A35B-Instruct_MXFP4 \| 255 GB \| 72% smaller \|