lmms-lab/LLaVA-OneVision-Mid-Data
Viewer • Updated • 563k • 131 • 21
Feeling and building the multimodal intelligence.
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
A Simple Baseline for Streaming Video Understanding