BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding Paper • 2606.31315 • Published 1 day ago • 67
TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents Paper • 2606.28480 • Published 6 days ago • 44
OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks Paper • 2606.29537 • Published 4 days ago • 17
The Verification Horizon: No Silver Bullet for Coding Agent Rewards Paper • 2606.26300 • Published 8 days ago • 46
Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence Paper • 2606.15932 • Published 16 days ago • 38
Autodata: An agentic data scientist to create high quality synthetic data Paper • 2606.25996 • Published 8 days ago • 18
Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 9 days ago • 144
EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions Paper • 2606.23654 • Published 10 days ago • 79
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 11 days ago • 96
CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents Paper • 2606.22883 • Published 10 days ago • 37
AweAgent Meta-Data Collection Meta-data for AweAgent: https://github.com/AweAI-Team/AweAgent • 5 items • Updated 12 days ago
VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models Paper • 2606.16140 • Published 17 days ago • 121