BiPACE: Bisimulation-Guided Policy Optimization with Action Counterfactual Estimation for LLM Agents Paper • 2606.25556 • Published 6 days ago • 1
The Detection--Extraction Gap: Models Know the Answer Before They Can Say It Paper • 2604.06613 • Published Apr 8 • 2
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning Paper • 2602.08234 • Published Feb 9 • 76