Instructions to use nanonets/Nanonets-OCR-s with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nanonets/Nanonets-OCR-s with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="nanonets/Nanonets-OCR-s")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("nanonets/Nanonets-OCR-s")
model = AutoModelForImageTextToText.from_pretrained("nanonets/Nanonets-OCR-s")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nanonets/Nanonets-OCR-s with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nanonets/Nanonets-OCR-s"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nanonets/Nanonets-OCR-s",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/nanonets/Nanonets-OCR-s

SGLang

How to use nanonets/Nanonets-OCR-s with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nanonets/Nanonets-OCR-s" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nanonets/Nanonets-OCR-s",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nanonets/Nanonets-OCR-s" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nanonets/Nanonets-OCR-s",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use nanonets/Nanonets-OCR-s with Docker Model Runner:
```
docker model run hf.co/nanonets/Nanonets-OCR-s
```

vLLM compatibility issue with nanonets/Nanonets-OCR-s: Processor initialization conflict

#18

by WpythonW - opened Jun 30, 2025

Discussion

WpythonW

Jun 30, 2025

Issue Description

The nanonets/Nanonets-OCR-s model fails to load in vLLM (v0.9.1) due to a processor configuration conflict, while it works fine with transformers directly.

Error Details

Main Error:

TypeError: Qwen2_5_VLProcessor.__init__() got multiple values for argument 'image_processor'

Additional Issues:

UTF-8 decoding error in processor config files:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 8: invalid start byte

The processor configuration appears to have duplicate or conflicting image_processor arguments that cause vLLM to fail during initialization.

Environment

vLLM version: 0.9.1
transformers version: Latest
Model: nanonets/Nanonets-OCR-s
Hardware: NVIDIA A100-PCIE-40GB

Commands Attempted

# All of these fail with the same error
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --limit-mm-per-prompt image=3
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --enforce-eager
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --tokenizer-mode slow

Expected Behavior

The model should load successfully in vLLM, similar to how base Qwen/Qwen2.5-VL-3B-Instruct works.

Actual Behavior

vLLM fails during processor initialization with the multiple values for argument 'image_processor' error.

Working Alternative

The model works perfectly with transformers directly:

from transformers import AutoModelForImageTextToText, AutoProcessor
model = AutoModelForImageTextToText.from_pretrained("nanonets/Nanonets-OCR-s", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("nanonets/Nanonets-OCR-s", trust_remote_code=True)

Request

Could you please:

Fix the processor configuration to be compatible with vLLM
Ensure all config files use proper UTF-8 encoding
Test compatibility with vLLM during model releases

This model is very useful for OCR tasks, and vLLM compatibility would be greatly appreciated by the community.

Full Error Log

Click to expand full error traceback

ERROR: Qwen2_5_VLProcessor.__init__() got multiple values for argument 'image_processor'
[Full traceback from your error logs...]

rawwerks

Jun 30, 2025

if anyone has this working on VLLM i would appreciate tips!

amazingvince

Jul 1, 2025

I got this working by installing older transformers after vllm install: pip install "transformers<4.53.0".

aziontech

Jul 1, 2025

This comment has been hidden (marked as Off-Topic)

alecauduro

Jul 1, 2025

•

edited Jul 1, 2025

It worked for me with docker, but only for JPG not PDF directly.

export MODEL_PORT=8000
export MODEL_ID=nanonets/Nanonets-OCR-s

docker run \
--runtime nvidia \
-e VLLM_USE_V1=1 \
--gpus all \
--ipc=host \
-p "${MODEL_PORT}:8000" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
-v "${HF_HOME}:/root/.cache/huggingface" \
vllm/vllm-openai:latest \
--model ${MODEL_ID}

WpythonW

Jul 2, 2025

I got this working by installing older transformers after vllm install: pip install "transformers<4.53.0".

Thank you! It worked for me!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment