Instructions to use docling-project/SmolDocling-256M-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use docling-project/SmolDocling-256M-preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="docling-project/SmolDocling-256M-preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("docling-project/SmolDocling-256M-preview")
model = AutoModelForImageTextToText.from_pretrained("docling-project/SmolDocling-256M-preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use docling-project/SmolDocling-256M-preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "docling-project/SmolDocling-256M-preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "docling-project/SmolDocling-256M-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/docling-project/SmolDocling-256M-preview

SGLang

How to use docling-project/SmolDocling-256M-preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "docling-project/SmolDocling-256M-preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "docling-project/SmolDocling-256M-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "docling-project/SmolDocling-256M-preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "docling-project/SmolDocling-256M-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use docling-project/SmolDocling-256M-preview with Docker Model Runner:
```
docker model run hf.co/docling-project/SmolDocling-256M-preview
```

Poor performance with simple table extraction task

#34

by hanshupe - opened Mar 30, 2025

Discussion

hanshupe

Mar 30, 2025

There is a lot of hype around multimodal models, such SmolDocling, I would like to know if others made a similar experience in practice: While they can do impressive things, they still struggle with table extraction, in cases which are straight-forward for humans.

Attached is a simple example, all I need is a reconstruction of the table as a flat CSV, preserving empty all empty cells correctly. Which open source model is able to do that?

MaxMnemonic

Docling org Mar 30, 2025

•

edited Mar 30, 2025

Thanks @hanshupe Try Docling (docling.io) and see if your table is properly extracted with it, we use dedicated table reconstruction model there.
However we are working on improving weights for SmolDocling, and results are promising, it should catch up on quality of table structure extraction with dedicated model.

hanshupe

Mar 31, 2025

Thanks, yes indeed docling performs much better here. Is there somewhere documented which table reconstruction model is used and OCR mode, is docTR support planned?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment