In-Context Reinforcement Learning for Tool Use in Large Language Models Paper • 2603.08068 • Published Mar 9 • 43
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Paper • 2512.17260 • Published Dec 19, 2025 • 52
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers Paper • 2509.06493 • Published Sep 8, 2025 • 12