view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 natolambert, LouisCastricato, lvwerra, Dahoas • Dec 9, 2022 • 417
view article Article Preference Tuning LLMs with Direct Preference Optimization Methods +3 kashif, edbeeching, lewtun, lvwerra, osanseviero • Jan 18, 2024 • 84