tomkay's picture
Publish: Data-Free Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models
5455d05 verified
metadata
title: Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models
emoji: 🔬
colorFrom: blue
colorTo: green
sdk: static
pinned: false

Per-Expert Mixed-Precision Quantization for 512-Expert MoE Models

Data-Free Per-Expert Mixed-Precision Quantization for 512-Expert Mixture-of-Experts Models

Black Sheep AI Research — baa.ai

Key Results

  • First data-free sensitivity study on 512-expert MoE models (2,347 tensors on Qwen3.5-397B)
  • Kurtosis is the dominant sensitivity predictor (Spearman rho = 0.795)
  • 89.4% of expert parameters safely quantize to 4-bit under SQNR safety floor
  • MCKP solver finds optimal allocation in <100ms for any model size
  • Group size matters more than bit-width allocation at 512-expert scale

Links