Instructions to use Onescaling/OneOCR-20260515 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Onescaling/OneOCR-20260515 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Onescaling/OneOCR-20260515")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Onescaling/OneOCR-20260515", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Onescaling/OneOCR-20260515 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Onescaling/OneOCR-20260515"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Onescaling/OneOCR-20260515",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Onescaling/OneOCR-20260515

SGLang

How to use Onescaling/OneOCR-20260515 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Onescaling/OneOCR-20260515" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Onescaling/OneOCR-20260515",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Onescaling/OneOCR-20260515" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Onescaling/OneOCR-20260515",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Onescaling/OneOCR-20260515 with Docker Model Runner:
```
docker model run hf.co/Onescaling/OneOCR-20260515
```

OneOCR-20260515: OCR Checkpoint

Model Description

OneOCR-20260515 is a dated OCR checkpoint designed to extract readable text from document images, receipts, forms, scanned pages, and handwritten lines.

This checkpoint is part of the OneOCR training line from OneScaling, a research lab founded in Bavaria, Germany, focused on practical OCR and document understanding models.

Developed by: OneScaling
Model name: OneOCR-20260515
Model type: Vision-language OCR checkpoint
Primary task: Image-to-text OCR and document text extraction
License: Apache 2.0

Checkpoint Overview

OneOCR-20260515 is a single OCR model checkpoint trained for document transcription and receipt-style text extraction. It is intended as a dated checkpoint rather than the final OneOCR release.

The checkpoint focuses on practical OCR behavior:

Document OCR - Extracts visible text from scanned pages and document images.
Receipt OCR - Reads item names, prices, totals, dates, and payment fields from receipt-like layouts.
Handwritten Line OCR - Attempts transcription of handwritten English line images.
Layout-Preserving Output - Returns text in a clean Markdown-like format when possible.
Image-to-Text Processing - Accepts image inputs and produces OCR text directly.
Checkpoint Usability - Provides a dated model snapshot for testing, comparison, and continued OCR training.

Core Capabilities

OCR

OneOCR-20260515 can extract text from images containing printed documents, receipts, product lists, totals, and short handwritten lines.

Receipt Understanding

The checkpoint is trained to preserve receipt-style information such as item names, quantities, totals, dates, tax values, and payment lines. It can be useful for experiments in receipt parsing and document AI workflows.

Markdown-Style Transcription

The model is prompted to return clean text. For tabular or structured documents, it may use line breaks, headings, labels, and simple Markdown-style formatting.

Model Details

Property	OneOCR-20260515
Model Type	Vision-language OCR checkpoint
Primary Modality	Image + text prompt to text output
Primary Task	OCR / document transcription
Training Focus	Receipts, documents, handwritten lines
Output Format	Plain text / Markdown-style OCR
License	Apache 2.0

Getting Started

Install the required dependencies:

pip install -U transformers torch accelerate pillow peft

Load the model with Transformers and PEFT:

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText

MODEL_ID = "OneScaling/OneOCR-20260515"

processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model.eval()

image = Image.open("document.png").convert("RGB")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {
                "type": "text",
                "text": "Extract all readable text from this document image. Return only the OCR result.",
            },
        ],
    }
]

prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = processor(text=[prompt], images=[[image]], return_tensors="pt").to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.12,
        no_repeat_ngram_size=5,
    )

generated = outputs[0][inputs["input_ids"].shape[-1]:]
text = processor.decode(generated, skip_special_tokens=True)
print(text.strip())

Best Practices

1. Use Clear OCR Prompts

Use direct prompts such as:

Extract all readable text from this document image. Return only the OCR result.

For receipts:

Extract all readable receipt text. Preserve item names, prices, totals, dates, and payment fields.

2. Prefer Deterministic Decoding

For OCR, sampling usually hurts accuracy. Recommended settings:

do_sample=False
temperature not needed
max_new_tokens=512 for short receipts or line images
max_new_tokens=1024 for longer documents
repetition_penalty=1.08 to 1.12
no_repeat_ngram_size=5

3. Use Enough Image Resolution

OCR quality depends heavily on image clarity. Use higher resolution for small text, receipts, and dense documents. Avoid blurry, cropped, low-contrast, or heavily compressed images when possible.

4. Validate Numeric Fields

This checkpoint can alter digits, totals, prices, dates, or IDs. For financial, invoice, or receipt workflows, validate extracted numeric fields with downstream checks.

Model Data

Training Data

OneOCR-20260515 was trained on real OCR-oriented datasets and document-style examples, including receipt and line-level OCR data. The training focus includes:

Printed receipt OCR
Structured receipt text extraction
Handwritten English line transcription
Document-style text extraction
OCR prompts for Markdown-like output

Data Processing

Training examples were formatted as image-to-text OCR tasks. Long targets were capped during training to reduce runaway generations and keep the checkpoint focused on readable transcription.

Usage and Limitations

Intended Usage

OneOCR-20260515 is intended for:

OCR research
Document AI experiments
Receipt OCR experiments
OCR prompt testing
Fine-tuning continuation
Comparing checkpoint progress over time

Limitations

This is a dated checkpoint, not the final OneOCR release.
The model may omit lines, especially on long receipts or dense documents.
The model may alter digits, prices, totals, dates, or IDs.
The model may hallucinate receipt fields or repeat layout patterns.
Handwriting performance is inconsistent.
It should not be used as the only OCR system for financial, legal, medical, identity, or safety-critical documents.
Human review or downstream validation is recommended for important outputs.

Ethics and Safety

OCR models can extract sensitive information from documents, receipts, IDs, forms, and private records. Users should apply this model responsibly and follow relevant privacy, security, and data-protection requirements.

Do not use this checkpoint to collect, expose, or process personal data without permission. For production systems, combine OCR with access controls, logging policies, privacy review, and human oversight where appropriate.

Citation

@misc{onescaling2026oneocr20260515,
      title={OneOCR-20260515 -- OCR Checkpoint},
      author={OneScaling},
      year={2026},
      url={https://huggingface.co/OneScaling/OneOCR-20260515},
}

Downloads last month: -; Downloads are not tracked for this model. How to track