Spaces:
Running
Running
| title: "Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models" | |
| emoji: 🔬 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: static | |
| pinned: false | |
| # Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models | |
| **Data-Free Per-Expert Mixed-Precision Quantization for 512-Expert Mixture-of-Experts Models** | |
| Black Sheep AI Research — [baa.ai](https://baa.ai) | |
| ## Key Results | |
| - First data-free sensitivity study on 512-expert MoE models (2,347 tensors on Qwen3.5-397B) | |
| - Kurtosis is the dominant sensitivity predictor (Spearman rho = 0.795) | |
| - 89.4% of expert parameters safely quantize to 4-bit under SQNR safety floor | |
| - MCKP solver finds optimal allocation in <100ms for any model size | |
| - Group size matters more than bit-width allocation at 512-expert scale | |
| ## Links | |
| - [Paper](https://baa-ai-moe-expert-quantization.static.hf.space) | [MINT Code](https://github.com/baa-ai/MINT) | [Models](https://huggingface.co/baa-ai) | |
| - [MINT Paper](https://baa.ai/articles/24-mint-paper.html) | [SWAN Paper](https://baa.ai/articles/07-swan-data-free-mixed-precision.html) | |