VoladorLuYu 's Collections LLM+Self-Play RL
updated
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published • 140
Recursive Introspection: Teaching Language Model Agents How to
Self-Improve
Paper
• 2407.18219
• Published • 3
Physics of Language Models: Part 2.2, How to Learn From Mistakes on
Grade-School Math Problems
Paper
• 2408.16293
• Published • 27
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve
Generalization in Large Language Models
Paper
• 2409.04787
• Published • 1
Self-Contrast: Better Reflection Through Inconsistent Solving
Perspectives
Paper
• 2401.02009
• Published • 1
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper
• 2503.07572
• Published • 48
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Paper
• 2504.19162
• Published • 18
General-Reasoner: Advancing LLM Reasoning Across All Domains
Paper
• 2505.14652
• Published • 24
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
Paper
• 2512.19673
• Published • 66