lmms-lab/LLaVA-OneVision-Mid-Data
Viewer
• Updated
• 563k • 158 • 21
Feeling and building the multimodal intelligence.
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling