Spaces:

baa-ai
/

MoE-Expert-Quantization

Running

Publish: Data-Free Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models

5455d05 verified 8 days ago

1.07 kB

	---
	title: "Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models"
	emoji: 🔬
	colorFrom: blue
	colorTo: green
	sdk: static
	pinned: false
	---

	# Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models

	Data-Free Per-Expert Mixed-Precision Quantization for 512-Expert Mixture-of-Experts Models

	Black Sheep AI Research — [baa.ai](https://baa.ai)

	## Key Results

	- First data-free sensitivity study on 512-expert MoE models (2,347 tensors on Qwen3.5-397B)
	- Kurtosis is the dominant sensitivity predictor (Spearman rho = 0.795)
	- 89.4% of expert parameters safely quantize to 4-bit under SQNR safety floor
	- MCKP solver finds optimal allocation in <100ms for any model size
	- Group size matters more than bit-width allocation at 512-expert scale

	## Links

	- [Paper](https://baa-ai-moe-expert-quantization.static.hf.space) \| [MINT Code](https://github.com/baa-ai/MINT) \| [Models](https://huggingface.co/baa-ai)
	- [MINT Paper](https://baa.ai/articles/24-mint-paper.html) \| [SWAN Paper](https://baa.ai/articles/07-swan-data-free-mixed-precision.html)