tomkay's picture
Publish: Data-Free Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models
5455d05 verified
---
title: "Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models"
emoji: 🔬
colorFrom: blue
colorTo: green
sdk: static
pinned: false
---
# Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models
**Data-Free Per-Expert Mixed-Precision Quantization for 512-Expert Mixture-of-Experts Models**
Black Sheep AI Research — [baa.ai](https://baa.ai)
## Key Results
- First data-free sensitivity study on 512-expert MoE models (2,347 tensors on Qwen3.5-397B)
- Kurtosis is the dominant sensitivity predictor (Spearman rho = 0.795)
- 89.4% of expert parameters safely quantize to 4-bit under SQNR safety floor
- MCKP solver finds optimal allocation in <100ms for any model size
- Group size matters more than bit-width allocation at 512-expert scale
## Links
- [Paper](https://baa-ai-moe-expert-quantization.static.hf.space) | [MINT Code](https://github.com/baa-ai/MINT) | [Models](https://huggingface.co/baa-ai)
- [MINT Paper](https://baa.ai/articles/24-mint-paper.html) | [SWAN Paper](https://baa.ai/articles/07-swan-data-free-mixed-precision.html)