Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data Paper • 2601.15158 • Published 21 days ago