view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego • Mar 10 • 152
view article Article Nemotron-Personas-Japan: Synthesized Data for Sovereign AI nvidia • Sep 23, 2025 • 27
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL +4 toslali-ibm, mirinflim, qgallouedec, esnible, rganti, mudhakar • Jun 3, 2025 • 101
Reward Bench 2 Collection Datasets, spaces, and models for Reward Bench 2 benchmark and paper! • 11 items • Updated Dec 23, 2025 • 16
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 32 items • Updated Mar 2 • 97
view article Article The N Implementation Details of RLHF with PPO +1 vwxyzjn, tianlinliu0121, lvwerra • Oct 24, 2023 • 72