reasoning_model
updated
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
• 2511.16334
• Published • 94
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
• 2509.07980
• Published • 105
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM
Test-time Compute
Paper
• 2509.04475
• Published • 3
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
• 2512.01374
• Published • 106
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper
• 2511.22570
• Published • 92
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models
Paper
• 2512.07843
• Published • 22
Paper
• 2510.01141
• Published • 123
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
• 2504.11468
• Published • 30
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper
• 2501.09686
• Published • 41
OpenR: An Open Source Framework for Advanced Reasoning with Large
Language Models
Paper
• 2410.09671
• Published • 1
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper
• 2512.16676
• Published • 221
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
Paper
• 2512.17260
• Published • 52
Latent Implicit Visual Reasoning
Paper
• 2512.21218
• Published • 69
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Paper
• 2512.20605
• Published • 62
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper
• 2512.19995
• Published • 16
P1: Mastering Physics Olympiads with Reinforcement Learning
Paper
• 2511.13612
• Published • 134
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper
• 2511.08567
• Published • 35
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model
Reasoning Ability in VibeThinker-1.5B
Paper
• 2511.06221
• Published • 133
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization
Paper
• 2511.12982
• Published • 4
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics
Olympiad Benchmark?
Paper
• 2509.07894
• Published • 31
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper
• 2512.24617
• Published • 65
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling
Paper
• 2601.02346
• Published • 26
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
Paper
• 2601.07226
• Published • 33
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper
• 2601.09088
• Published • 63
SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning
Paper
• 2601.04809
• Published • 3
Paper
• 2412.16720
• Published • 37
Learning Adaptive Parallel Reasoning with Language Models
Paper
• 2504.15466
• Published • 44
Training Large Language Models to Reason in a Continuous Latent Space
Paper
• 2412.06769
• Published • 94
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Paper
• 2601.20614
• Published • 120
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs
Paper
• 2602.10388
• Published • 243