-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 106 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 500 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
Jianhong Wang
hsvgbkhgbv
AI & ML interests
multi-agent reinforcement learning,
ad hoc teamwork,
robust reinforcement learning
Recent Activity
View all activity
Organizations
None yet