MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 276
Interpreter LoRA. Trained via offline CISPO (MiniMax-M1, arXiv:2506.13585) with Dr. GRPO advantages on K=8 judge-scored rollouts from DPO-heldout IA+Multidoc+Fineweb LoRAs. Beats CISPO v7 on AB, OOD, and ties on heldout_ia_v2.
| Set | pass@N | 95% CI | rollout-mean |
|---|---|---|---|
| AuditBench (56) | 76.8% | [64.2 - 85.9] | 49.4% [44.1 - 54.8] |
| heldout_ia_v2 (20) | 80.0% | [58.4 - 91.9] | 71.7% [60.3 - 83.1] |
| ood_models_v3 (23) | 56.5% | [36.8 - 74.4] | 20.9% [17.2 - 24.6] |
Feed direction tokens (shape [4480, 5120], svd_fixed_k16_mag7_rankfirst bf16) through AOEncoder, inject at layer-1 output at placeholder positions, apply this interpreter LoRA over frozen Qwen/Qwen3-14B, decode greedily.