X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction Paper • 2605.05765 • Published 5 days ago • 16
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14, 2025 • 309
Distilling LLM Agent into Small Models with Retrieval and Code Tools Paper • 2505.17612 • Published May 23, 2025 • 81
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1, 2025 • 67