view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention Oct 7, 2024 • 69
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 Dec 9, 2022 • 407
Light-R1 Collection Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond • 7 items • Updated Oct 15, 2025 • 12