OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains Paper • 2606.14702 • Published 16 days ago • 31
OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding Paper • 2605.18577 • Published May 18 • 5
Stage-adaptive Token Selection for Efficient Omni-modal LLMs Paper • 2605.20035 • Published May 19 • 5
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos Paper • 2603.14145 • Published Mar 14 • 15
SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval Paper • 2603.08224 • Published Mar 9 • 1
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models Paper • 2511.14582 • Published Nov 18, 2025 • 19
Multi-Object Sketch Animation by Scene Decomposition and Motion Planning Paper • 2503.19351 • Published Mar 25, 2025 • 1
[ICCV2025]MGSV Collection [ICCV 2025] Music Grounding by Short Video • 4 items • Updated 12 days ago • 1
VideoDeepResearch: Long Video Understanding With Agentic Tool Using Paper • 2506.10821 • Published Jun 12, 2025 • 19
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Paper • 2410.19702 • Published Oct 25, 2024 • 1