Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 325
Rethink_SFT_generalization Collection Repo for paper Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability. • 40 items • Updated Apr 11 • 20
Running 165 The ultimate guide to RL environments: building and scaling them in the LLM era 📝 165 Building and scaling RL environments for LLM training
unsloth-grpo-tests Collection test runs for unsloth grpo training -- math use case • 6 items • Updated Apr 13