Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle Paper • 2508.05612 • Published Aug 7 • 2
Shuffle-R1 Collection Shuffle-R1 checkpoints and training/evaluation datasets. • 7 items • Updated Aug 28 • 1
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28 • 131