Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 4 days ago • 124
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? Paper • 2606.17861 • Published 11 days ago • 56
Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation Paper • 2606.17030 • Published 12 days ago • 30
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 19 days ago • 102
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement Paper • 2606.11926 • Published 17 days ago • 118
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 22 days ago • 119
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published May 19 • 108
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published May 14 • 147
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published May 19 • 190
MMSkills: Towards Multimodal Skills for General Visual Agents Paper • 2605.13527 • Published May 14 • 121
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue Paper • 2605.01371 • Published May 2 • 6
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL Paper • 2604.28123 • Published May 1 • 49
SketchVLM: Vision language models can annotate images to explain thoughts and guide users Paper • 2604.22875 • Published Apr 23 • 38
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published Apr 27 • 68
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 119
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 231
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 244
WorldMark: A Unified Benchmark Suite for Interactive Video World Models Paper • 2604.21686 • Published Apr 23 • 36