Reasoning
updated
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
• 2501.18585
• Published • 61
Improving Multi-Step Reasoning Abilities of Large Language Models with
Direct Advantage Policy Optimization
Paper
• 2412.18279
• Published
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
• 2501.10799
• Published • 15
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper
• 2501.19324
• Published • 39
Dynamic Scaling of Unit Tests for Code Reward Modeling
Paper
• 2501.01054
• Published • 16
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
• 2411.04282
• Published • 37
SuperCorrect: Supervising and Correcting Language Models with
Error-Driven Insights
Paper
• 2410.09008
• Published • 17
Subtle Errors Matter: Preference Learning via Error-injected
Self-editing
Paper
• 2410.06638
• Published
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of
Language Models
Paper
• 2502.04404
• Published • 25
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
• 2502.06781
• Published • 58
SIFT: Grounding LLM Reasoning in Contexts via Stickers
Paper
• 2502.14922
• Published • 32
PhysDreamer: Physics-Based Interaction with 3D Objects via Video
Generation
Paper
• 2404.13026
• Published • 24
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
• 2503.07365
• Published • 61
Search-R1: Training LLMs to Reason and Leverage Search Engines with
Reinforcement Learning
Paper
• 2503.09516
• Published • 38
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
• 2503.12937
• Published • 30
I Have Covered All the Bases Here: Interpreting Reasoning Features in
Large Language Models via Sparse Autoencoders
Paper
• 2503.18878
• Published • 119
T1: Tool-integrated Self-verification for Test-time Compute Scaling in
Small Language Models
Paper
• 2504.04718
• Published • 43
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
• 2504.08837
• Published • 44
Think on your Feet: Adaptive Thinking via Reinforcement Learning for
Social Agents
Paper
• 2505.02156
• Published • 18
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop
Reasoning with Transformers
Paper
• 2504.20752
• Published • 94
MiMo: Unlocking the Reasoning Potential of Language Model -- From
Pretraining to Posttraining
Paper
• 2505.07608
• Published • 82
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published • 191
Think Only When You Need with Large Hybrid-Reasoning Models
Paper
• 2505.14631
• Published • 20
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation
Paper
• 2505.13215
• Published • 29
Error Typing for Smarter Rewards: Improving Process Reward Models with
Error-Aware Hierarchical Supervision
Paper
• 2505.19706
• Published • 3
rStar2-Agent: Agentic Reasoning Technical Report
Paper
• 2508.20722
• Published • 118
Reverse-Engineered Reasoning for Open-Ended Generation
Paper
• 2509.06160
• Published • 151