aws-prototyping
/

Qwen3-Coder-30B-A3B-Instruct_MXFP4

Text Generation

8-bit precision

Model card Files Files and versions

Qwen3-Coder-30B-A3B-Instruct_MXFP4 / README.md

matthausk's picture

Upload folder using huggingface_hub

c0bd6c9 verified about 1 month ago

|

history blame contribute delete

1.43 kB

	---
	library_name: transformers
	license: apache-2.0
	license_link: https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE
	pipeline_tag: text-generation
	---

	# Qwen3-Coder-30B-A3B-Instruct_MXFP4

	This checkpoint is a variant of [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct), where expert weights have been quantized to [MXFP4 format](https://huggingface.co/blog/faster-transformers#what-is-mxfp4) similarly to [gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) and [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b).

	For quantizing weights we used the function `downcast_to_mxfp` from [triton-kernels](https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/numerics_details/mxfp.py).

	The checkpoint might come with a small drop in accuracy, but has ~68% size reduction compared to the original BF16 checkpoint.

	## Accuracy Comparison

	\| Model \| GSM8K (strict-match) \| GSM8K (flexible-extract) \|
	\|-------\|---------------------\|--------------------------\|
	\| Qwen3-Coder-30B-A3B-Instruct (BF16) \| 90.67% ± 0.80% \| 89.92% ± 0.83% \|
	\| Qwen3-Coder-30B-A3B-Instruct_MXFP4 \| 89.76% ± 0.83% \| 88.70% ± 0.87% \|

	## Checkpoint Size

	\| Model \| Size \| Reduction \|
	\|-------\|------\|-----------\|
	\| Qwen3-Coder-30B-A3B-Instruct (BF16) \| 57 GB \| - \|
	\| Qwen3-Coder-30B-A3B-Instruct_MXFP4 \| 18 GB \| 68% smaller \|