alignment_24_best
updated
KTO: Model Alignment as Prospect Theoretic Optimization
Paper
• 2402.01306
• Published • 22
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
• 2305.18290
• Published • 64
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper
• 2405.14734
• Published • 12
Anchored Preference Optimization and Contrastive Revisions: Addressing
Underspecification in Alignment
Paper
• 2408.06266
• Published • 10
Back to Basics: Revisiting REINFORCE Style Optimization for Learning
from Human Feedback in LLMs
Paper
• 2402.14740
• Published • 18
Binary Classifier Optimization for Large Language Model Alignment
Paper
• 2404.04656
• Published • 2
Noise Contrastive Alignment of Language Models with Explicit Rewards
Paper
• 2402.05369
• Published • 2
Contrastive Preference Optimization: Pushing the Boundaries of LLM
Performance in Machine Translation
Paper
• 2401.08417
• Published • 37
Direct Language Model Alignment from Online AI Feedback
Paper
• 2402.04792
• Published • 35
Nash Learning from Human Feedback
Paper
• 2312.00886
• Published • 18
ORPO: Monolithic Preference Optimization without Reference Model
Paper
• 2403.07691
• Published • 72
Exploratory Preference Optimization: Harnessing Implicit
Q*-Approximation for Sample-Efficient RLHF
Paper
• 2405.21046
• Published • 4
From r to Q^*: Your Language Model is Secretly a Q-Function
Paper
• 2404.12358
• Published • 2
Offline Regularised Reinforcement Learning for Large Language Models
Alignment
Paper
• 2405.19107
• Published • 15
Towards Scalable Automated Alignment of LLMs: A Survey
Paper
• 2406.01252
• Published • 3
Towards a Unified View of Preference Learning for Large Language Models:
A Survey
Paper
• 2409.02795
• Published • 72
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy
Data
Paper
• 2404.14367
• Published • 1
Self-Supervised Alignment with Mutual Information: Learning to Follow
Principles without Preference Labels
Paper
• 2404.14313
• Published
Unpacking DPO and PPO: Disentangling Best Practices for Learning from
Preference Feedback
Paper
• 2406.09279
• Published • 3
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Paper
• 2404.10719
• Published • 6
Advancing LLM Reasoning Generalists with Preference Trees
Paper
• 2404.02078
• Published • 46
Building Math Agents with Multi-Turn Iterative Preference Learning
Paper
• 2409.02392
• Published • 16
Not All Preference Pairs Are Created Equal: A Recipe for
Annotation-Efficient Iterative Preference Learning
Paper
• 2406.17312
• Published
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
• 2406.00888
• Published • 33
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper
• 2310.13639
• Published • 25
Viewer
• Updated • 15k • 129
• 9
Towards Efficient and Exact Optimization of Language Model Alignment
Paper
• 2402.00856
• Published • 2
HelpSteer2-Preference: Complementing Ratings with Preferences
Paper
• 2410.01257
• Published • 24
General Preference Modeling with Preference Representations for Aligning
Language Models
Paper
• 2410.02197
• Published • 9
Modulated Intervention Preference Optimization (MIPO): Keep the Easy,
Refine the Difficult
Paper
• 2409.17545
• Published • 20
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
• 2405.07863
• Published • 71
Regressing the Relative Future: Efficient Policy Optimization for
Multi-turn RLHF
Paper
• 2410.04612
• Published
Understanding Likelihood Over-optimisation in Direct Alignment
Algorithms
Paper
• 2410.11677
• Published • 1