GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents Paper • 2606.24551 • Published 4 days ago • 5
The Verification Horizon: No Silver Bullet for Coding Agent Rewards Paper • 2606.26300 • Published 2 days ago • 9
Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation Paper • 2606.26907 • Published 1 day ago • 22
Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching Paper • 2606.24457 • Published 3 days ago • 3
Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning Paper • 2606.18831 • Published 9 days ago • 6
Notes2Skills: From Lab Notebooks to Certainty-Aware Scientific Agent Skills Paper • 2606.11897 • Published 16 days ago • 11
BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language Paper • 2606.22138 • Published 6 days ago • 21
DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams Paper • 2606.21337 • Published 7 days ago • 70
ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection Paper • 2606.24112 • Published 3 days ago • 3
UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating Paper • 2606.21661 • Published 7 days ago • 20
ShutterMuse: Capture-Time Photography Guidance with MLLMs Paper • 2606.25763 • Published 2 days ago • 38
Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models Paper • 2606.25473 • Published 2 days ago • 19
DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation Paper • 2606.26058 • Published 2 days ago • 57
Autodata: An agentic data scientist to create high quality synthetic data Paper • 2606.25996 • Published 2 days ago • 9
Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence Paper • 2606.15932 • Published 10 days ago • 28