Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains? Paper • 2510.11184 • Published Oct 13, 2025
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25, 2025 • 48
Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving Paper • 2506.17104 • Published Jun 20, 2025 • 1
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization Paper • 2504.21659 • Published Apr 30, 2025 • 14
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization Paper • 2504.21659 • Published Apr 30, 2025 • 14
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published Feb 24, 2025 • 26