reinforcement learning
updated
In deep reinforcement learning, a pruned network is a good network
Paper
• 2402.12479
• Published
• 19
Stop Regressing: Training Value Functions via Classification for
Scalable Deep RL
Paper
• 2403.03950
• Published
• 15
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
• 2405.07863
• Published
• 71
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
• 2405.11143
• Published
• 41
Understanding and Diagnosing Deep Reinforcement Learning
Paper
• 2406.16979
• Published
• 10
Efficient World Models with Context-Aware Tokenization
Paper
• 2406.19320
• Published
• 8
It Takes Two: Your GRPO Is Secretly DPO
Paper
• 2510.00977
• Published
• 32
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
• 2510.25992
• Published
• 48