DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 6 days ago • 201
Swift Sampling: Selecting Temporal Surprises via Taylor Series Paper • 2605.22678 • Published 5 days ago • 9
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 14 days ago • 191
Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models Paper • 2604.16902 • Published Apr 18 • 6
madhusudhan001/qwen2.5-0.5b-materials-science Text Generation • 0.5B • Updated 25 days ago • 309 • • 1
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 291
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 630