view article Article Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) Jan 19, 2025 • 41
Parallia/Fairly-Multilingual-ModernBERT-Embed-BE Sentence Similarity • 0.3B • Updated Jan 14, 2025 • 39 • 27
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 Dec 9, 2022 • 397