SmolVLM2 / README.md

cooper_robot

Add release note for v1.3.0

124f17a 10 days ago

1.62 kB

library_name: pytorch

SmolVLM is a family of compact vision–language models designed for efficient multimodal understanding by integrating lightweight visual encoders with small language models, with a focus on edge deployment and low-latency multimodal AI.

Original paper: SmolVLM: Redefining small and efficient multimodal models

SmolVLM2-500M-Video-Instruct

SmolVLM2-500M-Video-Instruct is a highly efficient ~500M-parameter variant optimized for low-memory footprint and fast multimodal inference. It is well suited for applications such as visual question answering, image captioning, document understanding, and real-time multimodal assistants on edge devices or resource-constrained environments.

Model Configuration:

Reference implementation: smollm
Original Weight: SmolVLM2-500M-Video-Instruct
Resolution: 3x512x512
Support Cooper version:
- Cooper SDK: [2.5.4]
- Cooper Foundry: [2.3]

Model	Device	Model Link
SmolVLM2-500M-Video-Instruct	CV7	Model_Link
SmolVLM2-500M-Video-Instruct	CV72	Model_Link
SmolVLM2-500M-Video-Instruct	CV75	Model_Link