lmms-lab/LLaVA-NeXT-Video-32B-Qwen
Video-Text-to-Text • 33B • Updated
• 216 • 17
Feeling and building the multimodal intelligence.
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling