view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective Jan 27 • 64
LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding Paper • 2602.23881 • Published 14 days ago • 18
MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization Paper • 2602.03537 • Published Feb 3 • 4
DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers Paper • 2602.02016 • Published Feb 2 • 12
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation Paper • 2601.22813 • Published Jan 30 • 57
WUSH: Near-Optimal Adaptive Transforms for LLM Quantization Paper • 2512.00956 • Published Nov 30, 2025 • 23
TiDAR: Think in Diffusion, Talk in Autoregression Paper • 2511.08923 • Published Nov 12, 2025 • 128
view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons Feb 4, 2025 • 31