The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL Paper • 2606.19162 • Published 10 days ago • 20
Bigger, Better, Faster: Human-level Atari with human-level efficiency Paper • 2305.19452 • Published May 30, 2023 • 5
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 107
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training Paper • 2503.18929 • Published Mar 24, 2025 • 4