---
tags:
  - smolvlm2
  - mirror
library_name: transformers
license: apache-2.0
---

# SmolVLM2-500M-Video-Instruct (full mirror)

Full mirror of [HuggingFaceTB/SmolVLM2-500M-Video-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct).

Includes:
- `model.safetensors` (~1.9 GB PyTorch weights)
- 14 ONNX variants under `onnx/` (fp16, int8, q4, uint8, etc. for decoder / embed_tokens / vision_encoder)
- Tokenizer files (`tokenizer.json`, `vocab.json`, `merges.txt`, `added_tokens.json`, `special_tokens_map.json`)
- Processor configs (`processor_config.json`, `preprocessor_config.json`, `chat_template.json`)
- `generation_config.json`, `config.json`

Mirrored via `huggingface_hub.snapshot_download`.

## Usage

```python
from transformers import AutoModel, AutoProcessor, AutoTokenizer
model = AutoModel.from_pretrained("arrow-hf/SmolVLM2-500M-Video-Instruct")
processor = AutoProcessor.from_pretrained("arrow-hf/SmolVLM2-500M-Video-Instruct")
tokenizer = AutoTokenizer.from_pretrained("arrow-hf/SmolVLM2-500M-Video-Instruct")
```

## Related

The tokenizer is used by [arrow-hf/smolvla-robotwin-stack-bowls-two-50pct](https://huggingface.co/arrow-hf/smolvla-robotwin-stack-bowls-two-50pct) (max_length=48). The SmolVLA policy is fine-tuned on top of this base VLM.