ocr / README.md
upprize's picture
.
700ddbf

A newer version of the Gradio SDK is available: 6.11.0

Upgrade
metadata
title: dots.ocr - Multilingual Document OCR
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: Multilingual document layout parsing with OCR
models:
  - rednote-hilab/DotsOCR
tags:
  - ocr
  - document-parsing
  - multilingual
  - layout-detection
  - vision-language

πŸ” dots.ocr - Multilingual Document Layout Parsing

This Hugging Face Space provides an easy-to-use interface for the dots.ocr model, a powerful multilingual document parser that unifies layout detection and content recognition.

Features

  • Multilingual Support: Robust parsing capabilities for multiple languages including low-resource languages
  • Layout Detection: Detects various document elements (Text, Title, Table, Formula, Caption, etc.)
  • Reading Order Preservation: Maintains proper reading order in complex layouts
  • Formula Recognition: Extracts mathematical formulas in LaTeX format
  • Table Extraction: Converts tables to HTML format
  • Markdown Output: Formats regular text as Markdown

Supported Layout Categories

  • Caption
  • Footnote
  • Formula (LaTeX output)
  • List-item
  • Page-footer
  • Page-header
  • Picture
  • Section-header
  • Table (HTML output)
  • Text
  • Title

⚠️ Important: Hugging Face Token Required

The dots.ocr model is gated and requires authentication. To use this Space:

  1. Get a Hugging Face Token:

  2. Request Access to the Model:

  3. Add Token to Space:

    • Go to your Space β†’ Settings
    • Add a new Secret:
      • Name: HF_TOKEN
      • Value: Your HF token
    • Rebuild the Space

πŸ“– Full guide: See HF_TOKEN_SETUP.md for detailed instructions

Usage

  1. Upload an Image: Upload a document image (photo, scan, or screenshot)
  2. Select Prompt Type: Choose from predefined prompts or use a custom one
    • Full Layout + OCR: Complete analysis with bounding boxes, categories, and text
    • OCR Only: Simple text extraction
    • Layout Detection Only: Just bounding boxes and categories
    • Custom: Write your own prompt
  3. Process: Click the "Process Document" button
  4. Results: Get structured JSON output with all detected elements

API Usage

You can also use this space via API:

from gradio_client import Client

client = Client("YOUR_SPACE_URL")
result = client.predict(
    image="path/to/your/image.jpg",
    prompt_type="Full Layout + OCR (English)",
    custom_prompt="",
    api_name="/predict"
)
print(result)

Model Information

  • Model: rednote-hilab/DotsOCR
  • Parameters: 1.7B (LLM foundation)
  • Performance: SOTA on OmniDocBench for text, tables, and reading order
  • Base Model: Qwen2.5-VL architecture

Limitations

  • Best performance on images with resolution under 11,289,600 pixels
  • May struggle with extremely high character-to-pixel ratios
  • Complex tables and formulas may not be perfect
  • Continuous special characters (ellipses, underscores) may cause issues

Citation

If you use this model, please cite:

@misc{dots.ocr,
  title={dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model},
  author={Rednote HiLab Team},
  year={2025},
  url={https://github.com/rednote-hilab/dots.ocr}
}

Links

License

MIT License - See the dots.ocr repository for details.

Acknowledgments

Built with: