view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 Dec 9, 2022 • 385
Light-R1 Collection Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond • 7 items • Updated Oct 15 • 12