view article Article Preference Tuning LLMs with Direct Preference Optimization Methods +3 Jan 18, 2024 • 77
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 288
view article Article Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models Dec 15, 2025 • 106
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 24 days ago • 212
WPO: Enhancing RLHF with Weighted Preference Optimization Paper • 2406.11827 • Published Jun 17, 2024 • 17
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 • 119
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published May 1, 2024 • 28
Noise Contrastive Alignment of Language Models with Explicit Rewards Paper • 2402.05369 • Published Feb 8, 2024 • 2
Towards Efficient and Exact Optimization of Language Model Alignment Paper • 2402.00856 • Published Feb 1, 2024 • 1
A General Theoretical Paradigm to Understand Learning from Human Preferences Paper • 2310.12036 • Published Oct 18, 2023 • 19