Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 188
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs Paper • 2410.18451 • Published Oct 24, 2024 • 20