OpenResearcher Collection OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis • 8 items • Updated Mar 24 • 18
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models Paper • 2602.12036 • Published Feb 12 • 93
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published Jan 10 • 54
Running 98 Unlocking On-Policy Distillation for Any Model Family 📝 98 Visualize on-policy distillation for any model family