SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper β’ 2603.23483 β’ Published about 15 hours ago β’ 23
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting Paper β’ 2510.21817 β’ Published Oct 21, 2025 β’ 42
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play? Paper β’ 2509.03516 β’ Published Sep 3, 2025 β’ 12
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension Paper β’ 2503.08689 β’ Published Mar 11, 2025 β’ 4
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension Paper β’ 2411.13093 β’ Published Nov 20, 2024 β’ 2
meta-llama/Meta-Llama-3-8B-Instruct Text Generation β’ 8B β’ Updated Jun 18, 2025 β’ 1.46M β’ β’ 4.43k