PyTorch
SmolVLM2 / README.md
cooper_robot
Add release note for v1.3.0
124f17a
|
Raw
History Blame Contribute Delete
1.62 kB
metadata
library_name: pytorch

smolvlm_logo

SmolVLM is a family of compact vision–language models designed for efficient multimodal understanding by integrating lightweight visual encoders with small language models, with a focus on edge deployment and low-latency multimodal AI.

Original paper: SmolVLM: Redefining small and efficient multimodal models

SmolVLM2-500M-Video-Instruct

SmolVLM2-500M-Video-Instruct is a highly efficient ~500M-parameter variant optimized for low-memory footprint and fast multimodal inference. It is well suited for applications such as visual question answering, image captioning, document understanding, and real-time multimodal assistants on edge devices or resource-constrained environments.

Model Configuration:

Model Device Model Link
SmolVLM2-500M-Video-Instruct CV7 Model_Link
SmolVLM2-500M-Video-Instruct CV72 Model_Link
SmolVLM2-500M-Video-Instruct CV75 Model_Link