lmms-lab/LLaVA-NeXT-Video-32B-Qwen
Video-Text-to-Text • 33B • Updated • 40 • 17
Feeling and building the multimodal intelligence.
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
A Simple Baseline for Streaming Video Understanding