-
Scaling Agent Learning via Experience Synthesis
Paper • 2511.03773 • Published • 83 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 128 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 232 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42
Yizhi
MercedeSnape
AI & ML interests
None yet
Recent Activity
upvoted a paper about 18 hours ago
HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs updated a collection about 2 months ago
agentic RL updated a collection about 2 months ago
BenchmarkOrganizations
None yet