CelesteChen 's Collections reasoning
updated
Large Language Models Can Self-Improve in Long-context Reasoning
Paper
• 2411.08147
• Published • 65
Reverse Thinking Makes LLMs Stronger Reasoners
Paper
• 2411.19865
• Published • 23
Training Large Language Models to Reason in a Continuous Latent Space
Paper
• 2412.06769
• Published • 94
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper
• 2412.18925
• Published • 107
ChemAgent: Self-updating Library in Large Language Models Improves
Chemical Reasoning
Paper
• 2501.06590
• Published • 11
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Paper
• 2501.12570
• Published • 28
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament
Paper
• 2501.13007
• Published • 19
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Paper
• 2501.11425
• Published • 109
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
• 2501.10799
• Published • 15
Process Reinforcement through Implicit Rewards
Paper
• 2502.01456
• Published • 62
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual
Reasoning in Mathematical LLMs
Paper
• 2502.10454
• Published • 7
Large Language Models and Mathematical Reasoning Failures
Paper
• 2502.11574
• Published • 3
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
Paper
• 2502.12054
• Published • 7
LightThinker: Thinking Step-by-Step Compression
Paper
• 2502.15589
• Published • 31
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Paper
• 2504.01943
• Published • 17
MolmoAct: Action Reasoning Models that can Reason in Space
Paper
• 2508.07917
• Published • 44
StepWiser: Stepwise Generative Judges for Wiser Reasoning
Paper
• 2508.19229
• Published • 20
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains
RLVR
Paper
• 2508.14029
• Published • 119
Does Your Reasoning Model Implicitly Know When to Stop Thinking?
Paper
• 2602.08354
• Published • 262