OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents Paper • 2606.02031 • Published 2 days ago • 10
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 21 days ago • 270
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 166
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics Paper • 2604.16254 • Published Apr 17 • 3
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 365
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 630
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published Feb 11 • 221