---
title: dots.ocr - Multilingual Document OCR
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: Multilingual document layout parsing with OCR
models:
  - rednote-hilab/DotsOCR
tags:
  - ocr
  - document-parsing
  - multilingual
  - layout-detection
  - vision-language
---

# 🔍 dots.ocr - Multilingual Document Layout Parsing

This Hugging Face Space provides an easy-to-use interface for the [dots.ocr](https://github.com/rednote-hilab/dots.ocr) model, a powerful multilingual document parser that unifies layout detection and content recognition.

## Features

- **Multilingual Support**: Robust parsing capabilities for multiple languages including low-resource languages
- **Layout Detection**: Detects various document elements (Text, Title, Table, Formula, Caption, etc.)
- **Reading Order Preservation**: Maintains proper reading order in complex layouts
- **Formula Recognition**: Extracts mathematical formulas in LaTeX format
- **Table Extraction**: Converts tables to HTML format
- **Markdown Output**: Formats regular text as Markdown

## Supported Layout Categories

- Caption
- Footnote
- Formula (LaTeX output)
- List-item
- Page-footer
- Page-header
- Picture
- Section-header
- Table (HTML output)
- Text
- Title

## ⚠️ Important: Hugging Face Token Required

The `dots.ocr` model is **gated** and requires authentication. To use this Space:

1. **Get a Hugging Face Token**:
   - Go to https://huggingface.co/settings/tokens
   - Create a new token with **Read** access
   
2. **Request Access to the Model**:
   - Visit https://huggingface.co/rednote-hilab/DotsOCR
   - Click "Request Access" (if gated)
   - Wait for approval

3. **Add Token to Space**:
   - Go to your Space → Settings
   - Add a new Secret:
     - Name: `HF_TOKEN`
     - Value: Your HF token
   - Rebuild the Space

📖 **Full guide**: See `HF_TOKEN_SETUP.md` for detailed instructions

## Usage

1. **Upload an Image**: Upload a document image (photo, scan, or screenshot)
2. **Select Prompt Type**: Choose from predefined prompts or use a custom one
   - **Full Layout + OCR**: Complete analysis with bounding boxes, categories, and text
   - **OCR Only**: Simple text extraction
   - **Layout Detection Only**: Just bounding boxes and categories
   - **Custom**: Write your own prompt
3. **Process**: Click the "Process Document" button
4. **Results**: Get structured JSON output with all detected elements

## API Usage

You can also use this space via API:

```python
from gradio_client import Client

client = Client("YOUR_SPACE_URL")
result = client.predict(
    image="path/to/your/image.jpg",
    prompt_type="Full Layout + OCR (English)",
    custom_prompt="",
    api_name="/predict"
)
print(result)
```

## Model Information

- **Model**: [rednote-hilab/DotsOCR](https://huggingface.co/rednote-hilab/DotsOCR)
- **Parameters**: 1.7B (LLM foundation)
- **Performance**: SOTA on OmniDocBench for text, tables, and reading order
- **Base Model**: Qwen2.5-VL architecture

## Limitations

- Best performance on images with resolution under 11,289,600 pixels
- May struggle with extremely high character-to-pixel ratios
- Complex tables and formulas may not be perfect
- Continuous special characters (ellipses, underscores) may cause issues

## Citation

If you use this model, please cite:

```bibtex
@misc{dots.ocr,
  title={dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model},
  author={Rednote HiLab Team},
  year={2025},
  url={https://github.com/rednote-hilab/dots.ocr}
}
```

## Links

- 📦 [GitHub Repository](https://github.com/rednote-hilab/dots.ocr)
- 🤗 [Model on Hugging Face](https://huggingface.co/rednote-hilab/DotsOCR)
- 📝 [Blog Post](https://www.xiaohongshu.com/blog)

## License

MIT License - See the [dots.ocr repository](https://github.com/rednote-hilab/dots.ocr) for details.

## Acknowledgments

Built with:
- [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL)
- [Gradio](https://gradio.app)
- [Hugging Face Transformers](https://huggingface.co/transformers)