Instructions to use nanonets/Nanonets-OCR-s with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nanonets/Nanonets-OCR-s with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="nanonets/Nanonets-OCR-s") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("nanonets/Nanonets-OCR-s") model = AutoModelForImageTextToText.from_pretrained("nanonets/Nanonets-OCR-s") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nanonets/Nanonets-OCR-s with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nanonets/Nanonets-OCR-s" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nanonets/Nanonets-OCR-s", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/nanonets/Nanonets-OCR-s
- SGLang
How to use nanonets/Nanonets-OCR-s with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nanonets/Nanonets-OCR-s" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nanonets/Nanonets-OCR-s", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nanonets/Nanonets-OCR-s" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nanonets/Nanonets-OCR-s", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use nanonets/Nanonets-OCR-s with Docker Model Runner:
docker model run hf.co/nanonets/Nanonets-OCR-s
vLLM compatibility issue with nanonets/Nanonets-OCR-s: Processor initialization conflict
Issue Description
The nanonets/Nanonets-OCR-s model fails to load in vLLM (v0.9.1) due to a processor configuration conflict, while it works fine with transformers directly.
Error Details
Main Error:
TypeError: Qwen2_5_VLProcessor.__init__() got multiple values for argument 'image_processor'
Additional Issues:
UTF-8 decoding error in processor config files:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 8: invalid start byteThe processor configuration appears to have duplicate or conflicting
image_processorarguments that cause vLLM to fail during initialization.
Environment
- vLLM version: 0.9.1
- transformers version: Latest
- Model:
nanonets/Nanonets-OCR-s - Hardware: NVIDIA A100-PCIE-40GB
Commands Attempted
# All of these fail with the same error
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --limit-mm-per-prompt image=3
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --enforce-eager
vllm serve nanonets/Nanonets-OCR-s --trust-remote-code --tokenizer-mode slow
Expected Behavior
The model should load successfully in vLLM, similar to how base Qwen/Qwen2.5-VL-3B-Instruct works.
Actual Behavior
vLLM fails during processor initialization with the multiple values for argument 'image_processor' error.
Working Alternative
The model works perfectly with transformers directly:
from transformers import AutoModelForImageTextToText, AutoProcessor
model = AutoModelForImageTextToText.from_pretrained("nanonets/Nanonets-OCR-s", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("nanonets/Nanonets-OCR-s", trust_remote_code=True)
Request
Could you please:
- Fix the processor configuration to be compatible with vLLM
- Ensure all config files use proper UTF-8 encoding
- Test compatibility with vLLM during model releases
This model is very useful for OCR tasks, and vLLM compatibility would be greatly appreciated by the community.
Full Error Log
Click to expand full error traceback
ERROR: Qwen2_5_VLProcessor.__init__() got multiple values for argument 'image_processor'
[Full traceback from your error logs...]
if anyone has this working on VLLM i would appreciate tips!
I got this working by installing older transformers after vllm install: pip install "transformers<4.53.0".
It worked for me with docker, but only for JPG not PDF directly.
export MODEL_PORT=8000
export MODEL_ID=nanonets/Nanonets-OCR-s
docker run \
--runtime nvidia \
-e VLLM_USE_V1=1 \
--gpus all \
--ipc=host \
-p "${MODEL_PORT}:8000" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
-v "${HF_HOME}:/root/.cache/huggingface" \
vllm/vllm-openai:latest \
--model ${MODEL_ID}
I got this working by installing older transformers after vllm install: pip install "transformers<4.53.0".
Thank you! It worked for me!