On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models Paper • 2602.03392 • Published 4 days ago • 8
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published Sep 3, 2025 • 23