Instructions to use reducto/RolmOCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use reducto/RolmOCR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="reducto/RolmOCR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("reducto/RolmOCR") model = AutoModelForImageTextToText.from_pretrained("reducto/RolmOCR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use reducto/RolmOCR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "reducto/RolmOCR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reducto/RolmOCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/reducto/RolmOCR
- SGLang
How to use reducto/RolmOCR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "reducto/RolmOCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reducto/RolmOCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "reducto/RolmOCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reducto/RolmOCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use reducto/RolmOCR with Docker Model Runner:
docker model run hf.co/reducto/RolmOCR
Compatibility with olmOCR repo
Great work! Since you mention this is a "drop in replacement", can I "drop it in" to https://github.com/allenai/olmocr with the --model arg for python -m olmocr.pipeline? Figured I'd ask before trying as you mention changes to the amounts of metadata it wants to see, etc.
edit: I know you provide an example with vLLM but this would require rebuilding olmocr.pipeline to have a CLI script I can point at a directory of PDF files
Great work! Since you mention this is a "drop in replacement", can I "drop it in" to https://github.com/allenai/olmocr with the
--modelarg forpython -m olmocr.pipeline? Figured I'd ask before trying as you mention changes to the amounts of metadata it wants to see, etc.edit: I know you provide an example with vLLM but this would require rebuilding
olmocr.pipelineto have a CLI script I can point at a directory of PDF files
Hi @pszemraj , the model should mostly be compatible with olmocr pipeline, but with some tweaks: the prompt is different (you might want to modify this: https://github.com/allenai/olmocr/blob/main/olmocr/prompts/prompts.py), and the model arch is now Qwen2.5-vl instead of Qwen2.0-vl. The rest of it should be the same.
thanks @yifei-reducto ! In the meantime I tried using the model with the original pipeline.py with some updates such as manually forcing the prompts to be the same as the ones you specify, etc. I ran into some strange issues even after inference 'worked' like wild hallucinations/repeats etc, so I abandoned the original pipeline code/sglang and opted for your vLLM approach.
I workshopped async_pipeline.py in this gist with gemini-2.5 and it seems to work pretty well for batch inference.
- Don't quote me on this, but maybe even an order of magnitude faster than what I saw with the original (olmOCR) inference code.
Quick overview of the process:
- ensure you have vllm, flash-attn, other deps installed as needed (see script). flashinfer is nice to have but how to get it to install is out of scope here lol
- serve the model locally in a separate tmux/screen/terminal with
vllm serve reducto/RolmOCR - after the endpoint is ready run
python async_pipeline.py --input_dir ./directory-of-pdfs(output dir inferred/named based on input dir, or pass--output_dir ./out)
PDFs are converted to images which are fired off async in batches of --concurrency_limit for fast vLLM inference. Can''t claim the code to be fully optimal, but it works well enough based on my tests - hope this helps anyone reading!