You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published May 20 • 51
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published May 20 • 51
G-Zero: Self-Play for Open-Ended Generation from Zero Data Paper • 2605.09959 • Published May 11 • 17