leondawn666 's Collections Agent & RL
updated
Towards General-Purpose Model-Free Reinforcement Learning
Paper
• 2501.16142
• Published • 31
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
• 2503.14476
• Published • 146
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
• 2504.13837
• Published • 141
Learning to Reason under Off-Policy Guidance
Paper
• 2504.14945
• Published • 88
ToolRL: Reward is All Tool Learning Needs
Paper
• 2504.13958
• Published • 49
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published • 122
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Paper
• 2504.16656
• Published • 58
Reinforcement Learning for Reasoning in Large Language Models with One
Training Example
Paper
• 2504.20571
• Published • 98
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
• 2504.10481
• Published • 85
Rethinking Reflection in Pre-Training
Paper
• 2504.04022
• Published • 80
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
• 2504.10479
• Published • 308
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
• 2504.01990
• Published • 305
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published • 62
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper
• 2505.17612
• Published • 81
ARM: Adaptive Reasoning Model
Paper
• 2505.20258
• Published • 45
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published • 191
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
• 2505.22617
• Published • 132
Reinforcement Pre-Training
Paper
• 2506.08007
• Published • 265
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance
Software Engineering?
Paper
• 2502.12115
• Published • 46
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
• 2505.24726
• Published • 282
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published • 154
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published • 125
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
Paper
• 2502.07316
• Published • 50
WebShaper: Agentically Data Synthesizing via Information-Seeking
Formalization
Paper
• 2507.15061
• Published • 60
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published • 263
MemOS: A Memory OS for AI System
Paper
• 2507.03724
• Published • 166
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published • 160
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published • 68
A Survey of Self-Evolving Agents: On Path to Artificial Super
Intelligence
Paper
• 2507.21046
• Published • 85
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published • 320
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
• 2508.01191
• Published • 240
On the Generalization of SFT: A Reinforcement Learning Perspective with
Reward Rectification
Paper
• 2508.05629
• Published • 190
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from
Experience
Paper
• 2508.04700
• Published • 52
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
Long-Term Memory
Paper
• 2508.09736
• Published • 58
SSRL: Self-Search Reinforcement Learning
Paper
• 2508.10874
• Published • 97
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent
Distillation and Agentic RL
Paper
• 2508.13167
• Published • 129
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
• 2509.08721
• Published • 664
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published • 148
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published • 513
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published • 237
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published • 193
Agent Learning via Early Experience
Paper
• 2510.08558
• Published • 276
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper
• 2510.07242
• Published • 30
Multi-Agent Tool-Integrated Policy Optimization
Paper
• 2510.04678
• Published • 31
Agentic Context Engineering: Evolving Contexts for Self-Improving
Language Models
Paper
• 2510.04618
• Published • 130
RLP: Reinforcement as a Pretraining Objective
Paper
• 2510.01265
• Published • 45
It Takes Two: Your GRPO Is Secretly DPO
Paper
• 2510.00977
• Published • 32
DCPO: Dynamic Clipping Policy Optimization
Paper
• 2509.02333
• Published • 22
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
• 2510.13786
• Published • 33
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
Paper
• 2511.15593
• Published • 59
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
Paper
• 2511.11793
• Published • 195
P1: Mastering Physics Olympiads with Reinforcement Learning
Paper
• 2511.13612
• Published • 134
LightRAG: Simple and Fast Retrieval-Augmented Generation
Paper
• 2410.05779
• Published • 37
OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists
Paper
• 2511.16931
• Published • 8
Budget-Aware Tool-Use Enables Effective Agent Scaling
Paper
• 2511.17006
• Published • 34