Self-rewarding correction for mathematical reasoning Paper • 2502.19613 • Published Feb 26, 2025 • 82
Building Math Agents with Multi-Turn Iterative Preference Learning Paper • 2409.02392 • Published Sep 4, 2024 • 16