yenson-lau 's Collections Papers
updated
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
• 2506.06395
• Published • 135
Paper
• 2506.10910
• Published • 68
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path
Lengths in LLMs
Paper
• 2506.07240
• Published • 7
Multiverse: Your Language Models Secretly Decide How to Parallelize and
Merge Generation
Paper
• 2506.09991
• Published • 55
Reinforcement Pre-Training
Paper
• 2506.08007
• Published • 265
The Illusion of Thinking: Understanding the Strengths and Limitations of
Reasoning Models via the Lens of Problem Complexity
Paper
• 2506.06941
• Published • 16
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper
• 2505.14146
• Published • 20
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper
• 2506.11763
• Published • 74
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes
Correct Reasoning in Base LLMs
Paper
• 2506.14245
• Published • 45
Reasoning with Exploration: An Entropy Perspective
Paper
• 2506.14758
• Published • 30
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via
Multi-Agent Multi-Turn Reinforcement Learning
Paper
• 2506.24119
• Published • 51
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
• 2507.00432
• Published • 79
Promptomatix: An Automatic Prompt Optimization Framework for Large
Language Models
Paper
• 2507.14241
• Published • 18
WebShaper: Agentically Data Synthesizing via Information-Seeking
Formalization
Paper
• 2507.15061
• Published • 60
Paper
• 2505.09388
• Published • 339
Replacing thinking with tool usage enables reasoning in small language
models
Paper
• 2507.05065
• Published • 16
Inverse Reinforcement Learning Meets Large Language Model Post-Training:
Basics, Advances, and Opportunities
Paper
• 2507.13158
• Published • 24
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published • 68
Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models
Paper
• 2508.10751
• Published • 29
MCP-Universe: Benchmarking Large Language Models with Real-World Model
Context Protocol Servers
Paper
• 2508.14704
• Published • 43
Deep Think with Confidence
Paper
• 2508.15260
• Published • 90
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
• 2508.16153
• Published • 162
AgentScope 1.0: A Developer-Centric Framework for Building Agentic
Applications
Paper
• 2508.16279
• Published • 61
InMind: Evaluating LLMs in Capturing and Applying Individual Human
Reasoning Styles
Paper
• 2508.16072
• Published • 4
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Paper
• 2508.18076
• Published • 6
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the
effect of Epistemic Markers on LLM-based Evaluation
Paper
• 2410.20774
• Published
Provable Benefits of In-Tool Learning for Large Language Models
Paper
• 2508.20755
• Published • 11
OpenClaw-RL: Train Any Agent Simply by Talking
Paper
• 2603.10165
• Published • 150