MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning Paper • 2605.13037 • Published 2 days ago • 4
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization Paper • 2605.10780 • Published 3 days ago • 31
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards Paper • 2605.10899 • Published 4 days ago • 70
δ-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published 3 days ago • 99
WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation Paper • 2603.16871 • Published Mar 17 • 61
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping Paper • 2604.08364 • Published Apr 9 • 101
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 121
A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping Paper • 2605.06200 • Published 8 days ago • 14
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex Paper • 2605.06139 • Published 8 days ago • 64
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published 8 days ago • 104
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper • 2605.05242 • Published 12 days ago • 102
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL Paper • 2604.28123 • Published 14 days ago • 47
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models Paper • 2605.05204 • Published 9 days ago • 25
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks Paper • 2604.20987 • Published 23 days ago • 21
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Paper • 2604.13602 • Published 30 days ago • 32