RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards Paper • 2605.10899 • Published May 11 • 79
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 24 days ago • 232
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published May 20 • 111
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation Paper • 2605.27366 • Published about 1 month ago • 29
Rethinking Memory as Continuously Evolving Connectivity Paper • 2605.28773 • Published 29 days ago • 34
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 29 days ago • 93
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published May 22 • 246
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories Paper • 2606.03979 • Published 23 days ago • 29
SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories Paper • 2606.01311 • Published 25 days ago • 37
Trust-Region Behavior Blending for On-Policy Distillation Paper • 2605.31159 • Published 27 days ago • 66
TAPS: Task Aware Proposal Distributions for Speculative Sampling Paper • 2603.27027 • Published Mar 27 • 145