Lei Wang's picture

Lei Wang

demolei

·

https://demoleiwang.github.io/HomePage/

AI & ML interests

LLMs

Recent Activity

upvoted a paper about 2 hours ago

Agents' Last Exam

upvoted a paper about 24 hours ago

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

upvoted a paper 1 day ago

OpenThoughts-Agent: Data Recipes for Agentic Models

View all activity

Organizations

upvoted a paper about 2 hours ago

Agents' Last Exam

Paper • 2606.05405 • Published 22 days ago • 363

upvoted a paper about 24 hours ago

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

Paper • 2606.22388 • Published 4 days ago • 87

upvoted 4 papers 1 day ago

OpenThoughts-Agent: Data Recipes for Agentic Models

Paper • 2606.24855 • Published 2 days ago • 31

Qwen-AgentWorld: Language World Models for General Agents

Paper • 2606.24597 • Published 2 days ago • 98

MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization

Paper • 2606.19930 • Published 7 days ago • 35

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Paper • 2606.19926 • Published 7 days ago • 34

upvoted a paper 10 days ago

Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

Paper • 2606.06036 • Published 21 days ago • 73

upvoted 4 papers 13 days ago

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Paper • 2606.13120 • Published 14 days ago • 4

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

Paper • 2606.09426 • Published 17 days ago • 102

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Paper • 2606.12087 • Published 15 days ago • 75

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Paper • 2606.13681 • Published 14 days ago • 140

upvoted a paper 16 days ago

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

Paper • 2606.07591 • Published 28 days ago • 95

upvoted a paper 20 days ago

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Paper • 2606.02060 • Published 24 days ago • 55

upvoted 2 papers 22 days ago

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Paper • 2605.29288 • Published 28 days ago • 9

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Paper • 2605.25624 • Published May 25 • 34

upvoted 5 papers about 1 month ago

OpenComputer: Verifiable Software Worlds for Computer-Use Agents

Paper • 2605.19769 • Published May 19 • 85

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published May 12 • 196

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

Paper • 2605.20025 • Published May 19 • 190

AI for Auto-Research: Roadmap & User Guide

Paper • 2605.18661 • Published May 18 • 69

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

Paper • 2605.10912 • Published May 11 • 46