SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20, 2025 • 107
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published Aug 28, 2025 • 37
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 181
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 Dec 9, 2022 • 393
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 133
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30, 2025 • 55
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published Sep 24, 2025 • 120
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11, 2025 • 50
FastCuRL Collection The collection for the Paper "Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models" • 6 items • Updated May 29, 2025 • 3
Tool-Star Collection Tool-Star is a reinforcement learning-based framework designed to empower LLMs to autonomously invoke multiple external tools during stepwise reasonin • 8 items • Updated Sep 2, 2025 • 5
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26, 2025 • 59
DeepCritic: Deliberate Critique with Large Language Models Paper • 2505.00662 • Published May 1, 2025 • 54
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Paper • 2504.11343 • Published Apr 15, 2025 • 19