Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO Paper • 2605.04077 • Published 24 days ago • 1
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI Paper • 2605.06651 • Published 1 day ago • 1
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published 1 day ago • 1
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key Paper • 2605.06638 • Published 1 day ago • 2
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published 1 day ago • 2
A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping Paper • 2605.06200 • Published 1 day ago • 5
MiA-Signature: Approximating Global Activation for Long-Context Understanding Paper • 2605.06416 • Published 1 day ago • 32
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes Paper • 2605.05724 • Published 1 day ago • 8
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction Paper • 2604.27393 • Published 8 days ago • 33
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning Paper • 2605.02913 • Published about 1 month ago • 5
Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO Paper • 2604.27488 • Published 8 days ago • 4
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published 4 days ago • 92
MedSkillAudit: A Domain-Specific Audit Framework for Medical Research Agent Skills Paper • 2604.20441 • Published 16 days ago • 2
Lightning Unified Video Editing via In-Context Sparse Attention Paper • 2605.04569 • Published 2 days ago • 11
Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation Paper • 2605.04128 • Published 3 days ago • 8
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems Paper • 2605.04018 • Published 3 days ago • 27
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published 2 days ago • 84
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models Paper • 2605.05204 • Published 2 days ago • 21