reasoning_model - a thangtm Collection

thangtm 's Collections

flow_matching_model

reasoning_model

Reduce_thinking

reasoning_model

updated May 14

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 96
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9, 2025 • 105
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

Paper • 2509.04475 • Published Aug 30, 2025 • 3
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 107
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Paper • 2511.22570 • Published Nov 27, 2025 • 96
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models

Paper • 2512.07843 • Published Nov 24, 2025 • 22
Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1, 2025 • 125
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10, 2025 • 30
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 23, 2025 • 41
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Paper • 2410.09671 • Published Oct 12, 2024 • 1
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 225
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Paper • 2512.17260 • Published Dec 19, 2025 • 53
Latent Implicit Visual Reasoning

Paper • 2512.21218 • Published Dec 24, 2025 • 70
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Paper • 2512.20605 • Published Dec 23, 2025 • 62
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

Paper • 2512.19995 • Published Dec 23, 2025 • 16
P1: Mastering Physics Olympiads with Reinforcement Learning

Paper • 2511.13612 • Published Nov 17, 2025 • 135
The Path Not Taken: RLVR Provably Learns Off the Principals

Paper • 2511.08567 • Published Nov 11, 2025 • 37
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 141
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization

Paper • 2511.12982 • Published Nov 17, 2025 • 4
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?

Paper • 2509.07894 • Published Sep 9, 2025 • 32
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Paper • 2512.24617 • Published Dec 31, 2025 • 67
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling

Paper • 2601.02346 • Published Jan 5 • 28
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors

Paper • 2601.07226 • Published Jan 12 • 33
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Paper • 2601.09088 • Published Jan 14 • 64
SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning

Paper • 2601.04809 • Published Jan 8 • 3
OpenAI o1 System Card

Paper • 2412.16720 • Published Dec 21, 2024 • 38
Learning Adaptive Parallel Reasoning with Language Models

Paper • 2504.15466 • Published Apr 21, 2025 • 44
Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 94
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

Paper • 2601.20614 • Published Jan 28 • 119
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

Paper • 2602.10388 • Published Feb 11 • 246
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Paper • 2506.21551 • Published Jun 26, 2025 • 28