wangbing1416
's Collections
Reasoning Papers
updated
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving
Clipping Policy Optimization
Paper
•
2508.07629
•
Published
•
43
Less Is More: Training-Free Sparse Attention with Global Locality for
Efficient Reasoning
Paper
•
2508.07101
•
Published
•
14
Compressing Chain-of-Thought in LLMs via Step Entropy
Paper
•
2508.03346
•
Published
•
8
Train Long, Think Short: Curriculum Learning for Efficient Reasoning
Paper
•
2508.08940
•
Published
•
27
Sample More to Think Less: Group Filtered Policy Optimization for
Concise Reasoning
Paper
•
2508.09726
•
Published
•
15
Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models
Paper
•
2508.10751
•
Published
•
28
Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning
Models to Ask for Information
Paper
•
2508.11252
•
Published
•
3
Deep Think with Confidence
Paper
•
2508.15260
•
Published
•
90
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains
RLVR
Paper
•
2508.14029
•
Published
•
118
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated
Chain-of-Thought-based Reinforced Fine-Tuning
Paper
•
2508.15868
•
Published
•
3
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
Learning for General LLM Reasoning
Paper
•
2508.16949
•
Published
•
24
TreePO: Bridging the Gap of Policy Optimization and Efficacy and
Inference Efficiency with Heuristic Tree-based Modeling
Paper
•
2508.17445
•
Published
•
80
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
Language Models
Paper
•
2508.18773
•
Published
•
16
StepWiser: Stepwise Generative Judges for Wiser Reasoning
Paper
•
2508.19229
•
Published
•
20
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task
Arithmetic
Paper
•
2509.01363
•
Published
•
59
Implicit Actor Critic Coupling via a Supervised Learning Framework for
RLVR
Paper
•
2509.02522
•
Published
•
26
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper
•
2509.03059
•
Published
•
25
Reverse-Engineered Reasoning for Open-Ended Generation
Paper
•
2509.06160
•
Published
•
149
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
•
2509.07980
•
Published
•
102
Staying in the Sweet Spot: Responsive Reasoning Evolution via
Capability-Adaptive Hint Scaffolding
Paper
•
2509.06923
•
Published
•
22
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Paper
•
2509.03646
•
Published
•
33
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
•
2509.08827
•
Published
•
190
The Majority is not always right: RL training for solution aggregation
Paper
•
2509.06870
•
Published
•
15
The Choice of Divergence: A Neglected Key to Mitigating Diversity
Collapse in Reinforcement Learning with Verifiable Reward
Paper
•
2509.07430
•
Published
•
3
Reasoning-Aware GRPO using Process Mining
Paper
•
2510.25065
•
Published
•
42
Scaling Latent Reasoning via Looped Language Models
Paper
•
2510.25741
•
Published
•
223
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable
Reasoning
Paper
•
2510.22543
•
Published
•
14
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
•
2510.25992
•
Published
•
48
SemCoT: Accelerating Chain-of-Thought Reasoning through
Semantically-Aligned Implicit Tokens
Paper
•
2510.24940
•
Published
•
18
MR-Align: Meta-Reasoning Informed Factuality Alignment for Large
Reasoning Models
Paper
•
2510.24794
•
Published
•
32
Data-Efficient RLVR via Off-Policy Influence Guidance
Paper
•
2510.26491
•
Published
•
11
Black-Box On-Policy Distillation of Large Language Models
Paper
•
2511.10643
•
Published
•
51
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Paper
•
2511.08577
•
Published
•
107
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper
•
2511.22570
•
Published
•
90
REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance
Paper
•
2511.20233
•
Published
•
3
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
Paper
•
2512.05033
•
Published
•
16
LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning
Paper
•
2512.05325
•
Published
•
3
Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision
Paper
•
2512.15489
•
Published
•
9
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper
•
2512.23988
•
Published
•
16
RelayLLM: Efficient Reasoning via Collaborative Decoding
Paper
•
2601.05167
•
Published
•
29
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs
Paper
•
2601.03559
•
Published
•
13
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
Paper
•
2601.06002
•
Published
•
51
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
Paper
•
2512.20908
•
Published
•
25
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper
•
2601.09088
•
Published
•
59
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Paper
•
2601.14249
•
Published
•
9
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
•
2601.18778
•
Published
•
32