view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective Jan 27 • 65
LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding Paper • 2602.23881 • Published 14 days ago • 18
MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization Paper • 2602.03537 • Published Feb 3 • 4
DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers Paper • 2602.02016 • Published Feb 2 • 12
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation Paper • 2601.22813 • Published Jan 30 • 57
WUSH: Near-Optimal Adaptive Transforms for LLM Quantization Paper • 2512.00956 • Published Nov 30, 2025 • 23
view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons Feb 4, 2025 • 31
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 Jul 29, 2025 • 218
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization Paper • 2509.23202 • Published Sep 27, 2025 • 29
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm Paper • 2507.18553 • Published Jul 24, 2025 • 41
nablaNABLA: Neighborhood Adaptive Block-Level Attention Paper • 2507.13546 • Published Jul 17, 2025 • 125
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models Paper • 2506.06751 • Published Jun 7, 2025 • 71
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA Paper • 2505.21115 • Published May 27, 2025 • 142
Alchemist: Turning Public Text-to-Image Data into Generative Gold Paper • 2505.19297 • Published May 25, 2025 • 84
Position of Uncertainty: A Cross-Linguistic Study of Positional Bias in Large Language Models Paper • 2505.16134 • Published May 22, 2025 • 18
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20, 2025 • 78