Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence Paper • 2606.15932 • Published 13 days ago • 38
InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning Paper • 2606.12195 • Published 19 days ago • 23
Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction Paper • 2606.05769 • Published 25 days ago • 6
RIVER: A Real-Time Interaction Benchmark for Video LLMs Paper • 2603.03985 • Published Mar 4 • 7
MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper • 2602.08794 • Published Feb 9 • 159
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22, 2024 • 29
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning Paper • 2601.10129 • Published Jan 15 • 13
VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs Paper • 2511.20272 • Published Nov 25, 2025 • 2
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision Paper • 2512.01342 • Published Dec 1, 2025 • 21
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning Paper • 2510.11606 • Published Oct 13, 2025 • 6
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos Paper • 2506.10857 • Published Jun 12, 2025 • 30