Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published 18 days ago • 159
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 19 days ago • 195
ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention Paper • 2605.23081 • Published 10 days ago • 40
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 4 days ago • 78
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published 11 days ago • 104
Rethinking Cross-Layer Information Routing in Diffusion Transformers Paper • 2605.20708 • Published 11 days ago • 106
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning Paper • 2605.25604 • Published 6 days ago • 131
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published 9 days ago • 204
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 11 days ago • 204
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper • 2605.05242 • Published 28 days ago • 116
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression Paper • 2604.19572 • Published Apr 21 • 23
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Paper • 2604.13602 • Published Apr 15 • 32
Elucidating the SNR-t Bias of Diffusion Probabilistic Models Paper • 2604.16044 • Published Apr 17 • 73
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 365
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published Apr 11 • 82
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance Paper • 2604.12627 • Published Apr 14 • 101