A newer version of the Gradio SDK is available: 6.11.0
metadata
title: dots.ocr - Multilingual Document OCR
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: Multilingual document layout parsing with OCR
models:
- rednote-hilab/DotsOCR
tags:
- ocr
- document-parsing
- multilingual
- layout-detection
- vision-language
π dots.ocr - Multilingual Document Layout Parsing
This Hugging Face Space provides an easy-to-use interface for the dots.ocr model, a powerful multilingual document parser that unifies layout detection and content recognition.
Features
- Multilingual Support: Robust parsing capabilities for multiple languages including low-resource languages
- Layout Detection: Detects various document elements (Text, Title, Table, Formula, Caption, etc.)
- Reading Order Preservation: Maintains proper reading order in complex layouts
- Formula Recognition: Extracts mathematical formulas in LaTeX format
- Table Extraction: Converts tables to HTML format
- Markdown Output: Formats regular text as Markdown
Supported Layout Categories
- Caption
- Footnote
- Formula (LaTeX output)
- List-item
- Page-footer
- Page-header
- Picture
- Section-header
- Table (HTML output)
- Text
- Title
β οΈ Important: Hugging Face Token Required
The dots.ocr model is gated and requires authentication. To use this Space:
Get a Hugging Face Token:
- Go to https://huggingface.co/settings/tokens
- Create a new token with Read access
Request Access to the Model:
- Visit https://huggingface.co/rednote-hilab/DotsOCR
- Click "Request Access" (if gated)
- Wait for approval
Add Token to Space:
- Go to your Space β Settings
- Add a new Secret:
- Name:
HF_TOKEN - Value: Your HF token
- Name:
- Rebuild the Space
π Full guide: See HF_TOKEN_SETUP.md for detailed instructions
Usage
- Upload an Image: Upload a document image (photo, scan, or screenshot)
- Select Prompt Type: Choose from predefined prompts or use a custom one
- Full Layout + OCR: Complete analysis with bounding boxes, categories, and text
- OCR Only: Simple text extraction
- Layout Detection Only: Just bounding boxes and categories
- Custom: Write your own prompt
- Process: Click the "Process Document" button
- Results: Get structured JSON output with all detected elements
API Usage
You can also use this space via API:
from gradio_client import Client
client = Client("YOUR_SPACE_URL")
result = client.predict(
image="path/to/your/image.jpg",
prompt_type="Full Layout + OCR (English)",
custom_prompt="",
api_name="/predict"
)
print(result)
Model Information
- Model: rednote-hilab/DotsOCR
- Parameters: 1.7B (LLM foundation)
- Performance: SOTA on OmniDocBench for text, tables, and reading order
- Base Model: Qwen2.5-VL architecture
Limitations
- Best performance on images with resolution under 11,289,600 pixels
- May struggle with extremely high character-to-pixel ratios
- Complex tables and formulas may not be perfect
- Continuous special characters (ellipses, underscores) may cause issues
Citation
If you use this model, please cite:
@misc{dots.ocr,
title={dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model},
author={Rednote HiLab Team},
year={2025},
url={https://github.com/rednote-hilab/dots.ocr}
}
Links
- π¦ GitHub Repository
- π€ Model on Hugging Face
- π Blog Post
License
MIT License - See the dots.ocr repository for details.
Acknowledgments
Built with: