Spaces:

upprize
/

ocr

Running on Zero

App Files Files Community

ocr / README.md

upprize

700ddbf 6 months ago

preview code

raw

history blame contribute delete

4.12 kB

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

metadata

title: dots.ocr - Multilingual Document OCR
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: Multilingual document layout parsing with OCR
models:
  - rednote-hilab/DotsOCR
tags:
  - ocr
  - document-parsing
  - multilingual
  - layout-detection
  - vision-language

🔍 dots.ocr - Multilingual Document Layout Parsing

This Hugging Face Space provides an easy-to-use interface for the dots.ocr model, a powerful multilingual document parser that unifies layout detection and content recognition.

Features

Multilingual Support: Robust parsing capabilities for multiple languages including low-resource languages
Layout Detection: Detects various document elements (Text, Title, Table, Formula, Caption, etc.)
Reading Order Preservation: Maintains proper reading order in complex layouts
Formula Recognition: Extracts mathematical formulas in LaTeX format
Table Extraction: Converts tables to HTML format
Markdown Output: Formats regular text as Markdown

⚠️ Important: Hugging Face Token Required

The dots.ocr model is gated and requires authentication. To use this Space:

Get a Hugging Face Token:
- Go to https://huggingface.co/settings/tokens
- Create a new token with Read access
Request Access to the Model:
- Visit https://huggingface.co/rednote-hilab/DotsOCR
- Click "Request Access" (if gated)
- Wait for approval
Add Token to Space:
- Go to your Space → Settings
- Add a new Secret:
  - Name: HF_TOKEN
  - Value: Your HF token
- Rebuild the Space

📖 Full guide: See HF_TOKEN_SETUP.md for detailed instructions

Usage

Upload an Image: Upload a document image (photo, scan, or screenshot)
Select Prompt Type: Choose from predefined prompts or use a custom one
- Full Layout + OCR: Complete analysis with bounding boxes, categories, and text
- OCR Only: Simple text extraction
- Layout Detection Only: Just bounding boxes and categories
- Custom: Write your own prompt
Process: Click the "Process Document" button
Results: Get structured JSON output with all detected elements

API Usage

You can also use this space via API:

from gradio_client import Client

client = Client("YOUR_SPACE_URL")
result = client.predict(
    image="path/to/your/image.jpg",
    prompt_type="Full Layout + OCR (English)",
    custom_prompt="",
    api_name="/predict"
)
print(result)

Model Information

Model: rednote-hilab/DotsOCR
Parameters: 1.7B (LLM foundation)
Performance: SOTA on OmniDocBench for text, tables, and reading order
Base Model: Qwen2.5-VL architecture

Limitations

Best performance on images with resolution under 11,289,600 pixels
May struggle with extremely high character-to-pixel ratios
Complex tables and formulas may not be perfect
Continuous special characters (ellipses, underscores) may cause issues

Citation

If you use this model, please cite:

@misc{dots.ocr,
  title={dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model},
  author={Rednote HiLab Team},
  year={2025},
  url={https://github.com/rednote-hilab/dots.ocr}
}

License

MIT License - See the dots.ocr repository for details.

Acknowledgments

Built with: