Safe and Scalable Web Agent Learning via Recreated Websites Paper • 2603.10505 • Published 6 days ago • 14
view article Article A Guide to Reinforcement Learning Post-Training for LLMs: PPO, DPO, GRPO, and Beyond Jan 19 • 10
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment Paper • 2311.04072 • Published Nov 7, 2023 • 1