On the Limits of LLM-as-Judge for Scientific Novelty Assessment Paper • 2606.12071 • Published 16 days ago • 3
GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards Paper • 2606.04889 • Published 23 days ago • 4
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 25 days ago • 232
GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards Paper • 2606.04889 • Published 23 days ago • 4
MinT: Managed Infrastructure for Training and Serving Millions of LLMs Paper • 2605.13779 • Published May 13 • 223
$δ$-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published May 12 • 131
Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned Paper • 2509.23250 • Published Sep 27, 2025 • 6
LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions Paper • 2508.18321 • Published Aug 24, 2025 • 2