Instructions to use HuggingFaceTB/SmolVLM-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HuggingFaceTB/SmolVLM-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="HuggingFaceTB/SmolVLM-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("HuggingFaceTB/SmolVLM-Instruct")
model = AutoModelForMultimodalLM.from_pretrained("HuggingFaceTB/SmolVLM-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use HuggingFaceTB/SmolVLM-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HuggingFaceTB/SmolVLM-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceTB/SmolVLM-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/HuggingFaceTB/SmolVLM-Instruct

SGLang

How to use HuggingFaceTB/SmolVLM-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HuggingFaceTB/SmolVLM-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceTB/SmolVLM-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HuggingFaceTB/SmolVLM-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceTB/SmolVLM-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use HuggingFaceTB/SmolVLM-Instruct with Docker Model Runner:
```
docker model run hf.co/HuggingFaceTB/SmolVLM-Instruct
```

loading images locally?

by fusi0n - opened Nov 26, 2024

Discussion

fusi0n

Nov 26, 2024

I can't seem to get the model to recognize any local images. I've tried loading them with PIL and Image.open("./test/test.jpg"), for example but no luck. Any ideas?

andito

Hugging Face Smol Models Research org Nov 27, 2024

Have you tried:

from transformers.image_utils import load_image
image1 = load_image("https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg")
?

fusi0n

Nov 28, 2024

I have. That works fine. But if I include a local directory like ./codespace/image1.jpg, the model does not see the image.

kishtyle

Dec 3, 2024

Anyone got it running with local images please post. Also I found it is handling only .jpg could not process .png, can anyone confirm this?

ctranslate2-4you

Dec 3, 2024

•

edited Dec 3, 2024

Here ya go...it's run a little different when processing a local file. Also, please note...

I opted to use the native prompt format because I like seeing it spelled out for some reason and don't like using "apply_chat_template".
I use a custom "set_cuda_paths" function at the top because I like pip installing these libraries rather than relying on a system-wide installation. If you use a system-wide installation (like most people do), simply remove this function.
I rely on a hardcoded path to the folder containing the model files rather than simply specifying the huggingface repo id because I like downloading the files first using snapshot_download where I can actually see the files rather than them being in my cache...adjust accordingly.

import sys
import os
from pathlib import Path

def set_cuda_paths():
    venv_base = Path(sys.executable).parent.parent
    nvidia_base_path = venv_base / 'Lib' / 'site-packages' / 'nvidia'
    cuda_path = nvidia_base_path / 'cuda_runtime' / 'bin'
    cublas_path = nvidia_base_path / 'cublas' / 'bin'
    cudnn_path = nvidia_base_path / 'cudnn' / 'bin'
    nvrtc_path = nvidia_base_path / 'cuda_nvrtc' / 'bin'
    
    paths_to_add = [
        str(cuda_path),
        str(cublas_path),
        str(cudnn_path),
        str(nvrtc_path),
    ]
    env_vars = ['CUDA_PATH', 'PATH']
    
    for env_var in env_vars:
        current_value = os.environ.get(env_var, '')
        new_value = os.pathsep.join(paths_to_add + [current_value] if current_value else paths_to_add)
        os.environ[env_var] = new_value

set_cuda_paths()

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForVision2Seq

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

image_path = r"D:\Scripts\bench_vision\IMG_140531.JPG"
image = Image.open(image_path)
width = image.width
height = image.height
model_dir = r"D:\Scripts\bench_vision\HuggingFaceTB--SmolVLM-Instruct"

processor = AutoProcessor.from_pretrained(model_dir)

model = AutoModelForVision2Seq.from_pretrained(
    model_dir,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2" if DEVICE == "cuda" else "eager",
    low_cpu_mem_usage=True,
)

model.to(DEVICE)

prompt = f"""<|im_start|>User:<image>Can you describe this image in detail but be succinct and do not repeat yourself?<end_of_utterance>
Assistant:"""

inputs = processor(text=prompt, images=[image], return_tensors="pt")
inputs = inputs.to(DEVICE)

generated_ids = model.generate(**inputs, max_new_tokens=500)
generated_ids = generated_ids[:, inputs['input_ids'].shape[1]:]
generated_texts = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True,
)
print(generated_texts[0])

ctranslate2-4you

Dec 8, 2024

I have. That works fine. But if I include a local directory like ./codespace/image1.jpg, the model does not see the image.

Did it work? Always curious of whether something works on another platform.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment