Spaces:
Running
Running
metadata
title: Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models
emoji: 🔬
colorFrom: blue
colorTo: green
sdk: static
pinned: false
Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models
Data-Free Per-Expert Mixed-Precision Quantization for 512-Expert Mixture-of-Experts Models
Black Sheep AI Research — baa.ai
Key Results
- First data-free sensitivity study on 512-expert MoE models (2,347 tensors on Qwen3.5-397B)
- Kurtosis is the dominant sensitivity predictor (Spearman rho = 0.795)
- 89.4% of expert parameters safely quantize to 4-bit under SQNR safety floor
- MCKP solver finds optimal allocation in <100ms for any model size
- Group size matters more than bit-width allocation at 512-expert scale
Links
- Paper | MINT Code | Models
- MINT Paper | SWAN Paper