Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions Paper • 2606.09076 • Published 4 days ago • 47
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published Nov 9, 2025 • 53
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3, 2025 • 76
view article Article ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models yuchenlin • Jul 27, 2024 • 35