From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning Paper • 2606.17682 • Published 10 days ago • 26
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement Paper • 2606.11926 • Published 16 days ago • 118
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks Paper • 2606.12344 • Published 16 days ago • 68
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments Paper • 2606.13681 • Published 15 days ago • 140
view article Article The Open Source Community is backing OpenEnv for Agentic RL +17 burtenshaw, spisakjo, lysandre, darktex, willcb, qjoy, pawalt, cwing-nv, danielhanchen, andrewzhou, thegovind, shimmyshimmer, Hamid-Nazeri, Sanyam, zkwentz, emre0, lewtun, sergiopaniego, banghua • 18 days ago • 91
SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks Paper • 2605.31433 • Published 28 days ago • 28
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 50 items • Updated 14 days ago • 167
view article Article Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler +3 ariG23498, sayakpaul, sergiopaniego, ror, pcuenq • 28 days ago • 127
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents Paper • 2605.25624 • Published May 25 • 34
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 113
Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published Apr 1 • 56
view article Article TRL v1.0: Post-Training Library Built to Move with the Field +2 qgallouedec, stevhliu, pcuenq, sergiopaniego • Mar 31 • 56