Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference Paper • 2508.19559 • Published Aug 27, 2025 • 6
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published Feb 13 • 44