SmolVLM2-500M-Video-Instruct (full mirror)

Full mirror of HuggingFaceTB/SmolVLM2-500M-Video-Instruct.

Includes:

  • model.safetensors (~1.9 GB PyTorch weights)
  • 14 ONNX variants under onnx/ (fp16, int8, q4, uint8, etc. for decoder / embed_tokens / vision_encoder)
  • Tokenizer files (tokenizer.json, vocab.json, merges.txt, added_tokens.json, special_tokens_map.json)
  • Processor configs (processor_config.json, preprocessor_config.json, chat_template.json)
  • generation_config.json, config.json

Mirrored via huggingface_hub.snapshot_download.

Usage

from transformers import AutoModel, AutoProcessor, AutoTokenizer
model = AutoModel.from_pretrained("arrow-hf/SmolVLM2-500M-Video-Instruct")
processor = AutoProcessor.from_pretrained("arrow-hf/SmolVLM2-500M-Video-Instruct")
tokenizer = AutoTokenizer.from_pretrained("arrow-hf/SmolVLM2-500M-Video-Instruct")

Related

The tokenizer is used by arrow-hf/smolvla-robotwin-stack-bowls-two-50pct (max_length=48). The SmolVLA policy is fine-tuned on top of this base VLM.

Downloads last month
17
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support