Lei Wang's picture

Lei Wang

demolei

·

https://demoleiwang.github.io/HomePage/

AI & ML interests

LLMs

Recent Activity

upvoted a paper 1 day ago

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

upvoted a paper 1 day ago

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

upvoted a paper 1 day ago

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

View all activity

Organizations

upvoted 5 papers 1 day ago

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

Paper • 2606.26027 • Published 4 days ago • 15

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Paper • 2606.24551 • Published 6 days ago • 25

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

Paper • 2606.26300 • Published 4 days ago • 37

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Paper • 2606.26790 • Published 3 days ago • 40

DanceOPD: On-Policy Generative Field Distillation

Paper • 2606.27377 • Published 3 days ago • 63

upvoted a paper 2 days ago

Agents' Last Exam

Paper • 2606.05405 • Published 25 days ago • 366

upvoted 5 papers 3 days ago

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

Paper • 2606.22388 • Published 7 days ago • 95

OpenThoughts-Agent: Data Recipes for Agentic Models

Paper • 2606.24855 • Published 5 days ago • 43

Qwen-AgentWorld: Language World Models for General Agents

Paper • 2606.24597 • Published 5 days ago • 132

MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization

Paper • 2606.19930 • Published 10 days ago • 42

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Paper • 2606.19926 • Published 10 days ago • 42

upvoted a paper 12 days ago

Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

Paper • 2606.06036 • Published 24 days ago • 75

upvoted 4 papers 15 days ago

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Paper • 2606.13120 • Published 17 days ago • 4

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

Paper • 2606.09426 • Published 20 days ago • 104

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Paper • 2606.12087 • Published 18 days ago • 77

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Paper • 2606.13681 • Published 17 days ago • 142

upvoted a paper 18 days ago

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

Paper • 2606.07591 • Published about 1 month ago • 97

upvoted a paper 22 days ago

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Paper • 2606.02060 • Published 27 days ago • 57

upvoted 2 papers 24 days ago

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Paper • 2605.29288 • Published about 1 month ago • 9

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Paper • 2605.25624 • Published May 25 • 34