view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11, 2025 • 108
Top 10% instruction tuning datasets Collection Collects datasets with 'instruction' in the name and more than 1 download and in the top 10% for the number of likes • 13 items • Updated Jul 3, 2024 • 13