daVinci-Dev: Agent-native Mid-training for Software Engineering Paper • 2601.18418 • Published 6 days ago • 123
AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts Paper • 2601.11044 • Published 17 days ago • 34
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published Dec 29, 2025 • 65
SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling Paper • 2512.00466 • Published Nov 29, 2025 • 10
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published Nov 19, 2025 • 97
Do You Need Proprioceptive States in Visuomotor Policies? Paper • 2509.18644 • Published Sep 23, 2025 • 50
DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery Paper • 2508.06960 • Published Aug 9, 2025 • 1