view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 natolambert, LouisCastricato, lvwerra, Dahoas • Dec 9, 2022 • 411
view article Article StackLLaMA: A hands-on guide to train LLaMA with RLHF +5 edbeeching, kashif, ybelkada, lewtun, lvwerra, nazneen, natolambert • Apr 5, 2023 • 48