OneOCR-20260515: OCR Checkpoint

Model Description

OneOCR-20260515 is a dated OCR checkpoint designed to extract readable text from document images, receipts, forms, scanned pages, and handwritten lines.

This checkpoint is part of the OneOCR training line from OneScaling, a research lab founded in Bavaria, Germany, focused on practical OCR and document understanding models.

  • Developed by: OneScaling
  • Model name: OneOCR-20260515
  • Model type: Vision-language OCR checkpoint
  • Primary task: Image-to-text OCR and document text extraction
  • License: Apache 2.0

Checkpoint Overview

OneOCR-20260515 is a single OCR model checkpoint trained for document transcription and receipt-style text extraction. It is intended as a dated checkpoint rather than the final OneOCR release.

The checkpoint focuses on practical OCR behavior:

  • Document OCR - Extracts visible text from scanned pages and document images.
  • Receipt OCR - Reads item names, prices, totals, dates, and payment fields from receipt-like layouts.
  • Handwritten Line OCR - Attempts transcription of handwritten English line images.
  • Layout-Preserving Output - Returns text in a clean Markdown-like format when possible.
  • Image-to-Text Processing - Accepts image inputs and produces OCR text directly.
  • Checkpoint Usability - Provides a dated model snapshot for testing, comparison, and continued OCR training.

Core Capabilities

OCR

OneOCR-20260515 can extract text from images containing printed documents, receipts, product lists, totals, and short handwritten lines.

Receipt Understanding

The checkpoint is trained to preserve receipt-style information such as item names, quantities, totals, dates, tax values, and payment lines. It can be useful for experiments in receipt parsing and document AI workflows.

Markdown-Style Transcription

The model is prompted to return clean text. For tabular or structured documents, it may use line breaks, headings, labels, and simple Markdown-style formatting.

Model Details

Property OneOCR-20260515
Model Type Vision-language OCR checkpoint
Primary Modality Image + text prompt to text output
Primary Task OCR / document transcription
Training Focus Receipts, documents, handwritten lines
Output Format Plain text / Markdown-style OCR
License Apache 2.0

Getting Started

Install the required dependencies:

pip install -U transformers torch accelerate pillow peft

Load the model with Transformers and PEFT:

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText

MODEL_ID = "OneScaling/OneOCR-20260515"

processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model.eval()

image = Image.open("document.png").convert("RGB")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {
                "type": "text",
                "text": "Extract all readable text from this document image. Return only the OCR result.",
            },
        ],
    }
]

prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = processor(text=[prompt], images=[[image]], return_tensors="pt").to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.12,
        no_repeat_ngram_size=5,
    )

generated = outputs[0][inputs["input_ids"].shape[-1]:]
text = processor.decode(generated, skip_special_tokens=True)
print(text.strip())

Best Practices

1. Use Clear OCR Prompts

Use direct prompts such as:

Extract all readable text from this document image. Return only the OCR result.

For receipts:

Extract all readable receipt text. Preserve item names, prices, totals, dates, and payment fields.

2. Prefer Deterministic Decoding

For OCR, sampling usually hurts accuracy. Recommended settings:

  • do_sample=False
  • temperature not needed
  • max_new_tokens=512 for short receipts or line images
  • max_new_tokens=1024 for longer documents
  • repetition_penalty=1.08 to 1.12
  • no_repeat_ngram_size=5

3. Use Enough Image Resolution

OCR quality depends heavily on image clarity. Use higher resolution for small text, receipts, and dense documents. Avoid blurry, cropped, low-contrast, or heavily compressed images when possible.

4. Validate Numeric Fields

This checkpoint can alter digits, totals, prices, dates, or IDs. For financial, invoice, or receipt workflows, validate extracted numeric fields with downstream checks.


Model Data

Training Data

OneOCR-20260515 was trained on real OCR-oriented datasets and document-style examples, including receipt and line-level OCR data. The training focus includes:

  • Printed receipt OCR
  • Structured receipt text extraction
  • Handwritten English line transcription
  • Document-style text extraction
  • OCR prompts for Markdown-like output

Data Processing

Training examples were formatted as image-to-text OCR tasks. Long targets were capped during training to reduce runaway generations and keep the checkpoint focused on readable transcription.


Usage and Limitations

Intended Usage

OneOCR-20260515 is intended for:

  • OCR research
  • Document AI experiments
  • Receipt OCR experiments
  • OCR prompt testing
  • Fine-tuning continuation
  • Comparing checkpoint progress over time

Limitations

  • This is a dated checkpoint, not the final OneOCR release.
  • The model may omit lines, especially on long receipts or dense documents.
  • The model may alter digits, prices, totals, dates, or IDs.
  • The model may hallucinate receipt fields or repeat layout patterns.
  • Handwriting performance is inconsistent.
  • It should not be used as the only OCR system for financial, legal, medical, identity, or safety-critical documents.
  • Human review or downstream validation is recommended for important outputs.

Ethics and Safety

OCR models can extract sensitive information from documents, receipts, IDs, forms, and private records. Users should apply this model responsibly and follow relevant privacy, security, and data-protection requirements.

Do not use this checkpoint to collect, expose, or process personal data without permission. For production systems, combine OCR with access controls, logging policies, privacy review, and human oversight where appropriate.


Citation

@misc{onescaling2026oneocr20260515,
      title={OneOCR-20260515 -- OCR Checkpoint},
      author={OneScaling},
      year={2026},
      url={https://huggingface.co/OneScaling/OneOCR-20260515},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support