Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs Paper • 2509.25779 • Published Sep 30, 2025 • 19
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated Dec 23, 2025 • 96