Instructions to use Onescaling/OneOCR-20260515 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Onescaling/OneOCR-20260515 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Onescaling/OneOCR-20260515") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Onescaling/OneOCR-20260515", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Onescaling/OneOCR-20260515 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Onescaling/OneOCR-20260515" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Onescaling/OneOCR-20260515", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Onescaling/OneOCR-20260515
- SGLang
How to use Onescaling/OneOCR-20260515 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Onescaling/OneOCR-20260515" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Onescaling/OneOCR-20260515", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Onescaling/OneOCR-20260515" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Onescaling/OneOCR-20260515", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Onescaling/OneOCR-20260515 with Docker Model Runner:
docker model run hf.co/Onescaling/OneOCR-20260515
OneOCR-20260515: OCR Checkpoint
Model Description
OneOCR-20260515 is a dated OCR checkpoint designed to extract readable text from document images, receipts, forms, scanned pages, and handwritten lines.
This checkpoint is part of the OneOCR training line from OneScaling, a research lab founded in Bavaria, Germany, focused on practical OCR and document understanding models.
- Developed by: OneScaling
- Model name: OneOCR-20260515
- Model type: Vision-language OCR checkpoint
- Primary task: Image-to-text OCR and document text extraction
- License: Apache 2.0
Checkpoint Overview
OneOCR-20260515 is a single OCR model checkpoint trained for document transcription and receipt-style text extraction. It is intended as a dated checkpoint rather than the final OneOCR release.
The checkpoint focuses on practical OCR behavior:
- Document OCR - Extracts visible text from scanned pages and document images.
- Receipt OCR - Reads item names, prices, totals, dates, and payment fields from receipt-like layouts.
- Handwritten Line OCR - Attempts transcription of handwritten English line images.
- Layout-Preserving Output - Returns text in a clean Markdown-like format when possible.
- Image-to-Text Processing - Accepts image inputs and produces OCR text directly.
- Checkpoint Usability - Provides a dated model snapshot for testing, comparison, and continued OCR training.
Core Capabilities
OCR
OneOCR-20260515 can extract text from images containing printed documents, receipts, product lists, totals, and short handwritten lines.
Receipt Understanding
The checkpoint is trained to preserve receipt-style information such as item names, quantities, totals, dates, tax values, and payment lines. It can be useful for experiments in receipt parsing and document AI workflows.
Markdown-Style Transcription
The model is prompted to return clean text. For tabular or structured documents, it may use line breaks, headings, labels, and simple Markdown-style formatting.
Model Details
| Property | OneOCR-20260515 |
|---|---|
| Model Type | Vision-language OCR checkpoint |
| Primary Modality | Image + text prompt to text output |
| Primary Task | OCR / document transcription |
| Training Focus | Receipts, documents, handwritten lines |
| Output Format | Plain text / Markdown-style OCR |
| License | Apache 2.0 |
Getting Started
Install the required dependencies:
pip install -U transformers torch accelerate pillow peft
Load the model with Transformers and PEFT:
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText
MODEL_ID = "OneScaling/OneOCR-20260515"
processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model.eval()
image = Image.open("document.png").convert("RGB")
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{
"type": "text",
"text": "Extract all readable text from this document image. Return only the OCR result.",
},
],
}
]
prompt = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = processor(text=[prompt], images=[[image]], return_tensors="pt").to(model.device)
with torch.inference_mode():
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False,
repetition_penalty=1.12,
no_repeat_ngram_size=5,
)
generated = outputs[0][inputs["input_ids"].shape[-1]:]
text = processor.decode(generated, skip_special_tokens=True)
print(text.strip())
Best Practices
1. Use Clear OCR Prompts
Use direct prompts such as:
Extract all readable text from this document image. Return only the OCR result.
For receipts:
Extract all readable receipt text. Preserve item names, prices, totals, dates, and payment fields.
2. Prefer Deterministic Decoding
For OCR, sampling usually hurts accuracy. Recommended settings:
do_sample=Falsetemperaturenot neededmax_new_tokens=512for short receipts or line imagesmax_new_tokens=1024for longer documentsrepetition_penalty=1.08to1.12no_repeat_ngram_size=5
3. Use Enough Image Resolution
OCR quality depends heavily on image clarity. Use higher resolution for small text, receipts, and dense documents. Avoid blurry, cropped, low-contrast, or heavily compressed images when possible.
4. Validate Numeric Fields
This checkpoint can alter digits, totals, prices, dates, or IDs. For financial, invoice, or receipt workflows, validate extracted numeric fields with downstream checks.
Model Data
Training Data
OneOCR-20260515 was trained on real OCR-oriented datasets and document-style examples, including receipt and line-level OCR data. The training focus includes:
- Printed receipt OCR
- Structured receipt text extraction
- Handwritten English line transcription
- Document-style text extraction
- OCR prompts for Markdown-like output
Data Processing
Training examples were formatted as image-to-text OCR tasks. Long targets were capped during training to reduce runaway generations and keep the checkpoint focused on readable transcription.
Usage and Limitations
Intended Usage
OneOCR-20260515 is intended for:
- OCR research
- Document AI experiments
- Receipt OCR experiments
- OCR prompt testing
- Fine-tuning continuation
- Comparing checkpoint progress over time
Limitations
- This is a dated checkpoint, not the final OneOCR release.
- The model may omit lines, especially on long receipts or dense documents.
- The model may alter digits, prices, totals, dates, or IDs.
- The model may hallucinate receipt fields or repeat layout patterns.
- Handwriting performance is inconsistent.
- It should not be used as the only OCR system for financial, legal, medical, identity, or safety-critical documents.
- Human review or downstream validation is recommended for important outputs.
Ethics and Safety
OCR models can extract sensitive information from documents, receipts, IDs, forms, and private records. Users should apply this model responsibly and follow relevant privacy, security, and data-protection requirements.
Do not use this checkpoint to collect, expose, or process personal data without permission. For production systems, combine OCR with access controls, logging policies, privacy review, and human oversight where appropriate.
Citation
@misc{onescaling2026oneocr20260515,
title={OneOCR-20260515 -- OCR Checkpoint},
author={OneScaling},
year={2026},
url={https://huggingface.co/OneScaling/OneOCR-20260515},
}