view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11, 2025 • 100
Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation Paper • 2407.12223 • Published Jul 17, 2024 • 2