From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR Paper • 2508.07534 • Published Aug 11, 2025 • 1
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models Paper • 2406.12397 • Published Jun 18, 2024
Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework Paper • 2509.05007 • Published Sep 5, 2025
Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning Paper • 2508.02260 • Published Aug 4, 2025
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction Paper • 2511.07327 • Published Nov 10, 2025 • 78
AlphaMath Almost Zero: process Supervision without process Paper • 2405.03553 • Published May 6, 2024 • 1
Step-level Value Preference Optimization for Mathematical Reasoning Paper • 2406.10858 • Published Jun 16, 2024
C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation Paper • 2502.06205 • Published Feb 10, 2025
Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning Paper • 2502.11799 • Published Feb 17, 2025
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents Paper • 2509.13309 • Published Sep 16, 2025 • 67
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization Paper • 2510.24592 • Published Oct 28, 2025 • 17
AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis Paper • 2510.24695 • Published Oct 28, 2025 • 24
The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models Paper • 2401.03205 • Published Jan 6, 2024
Towards Effective and Efficient Continual Pre-training of Large Language Models Paper • 2407.18743 • Published Jul 26, 2024
Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search Paper • 2411.11694 • Published Nov 18, 2024
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems Paper • 2412.09413 • Published Dec 12, 2024 • 1