| --- |
| title: dots.ocr - Multilingual Document OCR |
| emoji: π |
| colorFrom: blue |
| colorTo: purple |
| sdk: gradio |
| sdk_version: 5.49.1 |
| app_file: app.py |
| pinned: false |
| license: mit |
| short_description: Multilingual document layout parsing with OCR |
| models: |
| - rednote-hilab/DotsOCR |
| tags: |
| - ocr |
| - document-parsing |
| - multilingual |
| - layout-detection |
| - vision-language |
| --- |
| |
| # π dots.ocr - Multilingual Document Layout Parsing |
|
|
| This Hugging Face Space provides an easy-to-use interface for the [dots.ocr](https://github.com/rednote-hilab/dots.ocr) model, a powerful multilingual document parser that unifies layout detection and content recognition. |
|
|
| ## Features |
|
|
| - **Multilingual Support**: Robust parsing capabilities for multiple languages including low-resource languages |
| - **Layout Detection**: Detects various document elements (Text, Title, Table, Formula, Caption, etc.) |
| - **Reading Order Preservation**: Maintains proper reading order in complex layouts |
| - **Formula Recognition**: Extracts mathematical formulas in LaTeX format |
| - **Table Extraction**: Converts tables to HTML format |
| - **Markdown Output**: Formats regular text as Markdown |
|
|
| ## Supported Layout Categories |
|
|
| - Caption |
| - Footnote |
| - Formula (LaTeX output) |
| - List-item |
| - Page-footer |
| - Page-header |
| - Picture |
| - Section-header |
| - Table (HTML output) |
| - Text |
| - Title |
|
|
| ## β οΈ Important: Hugging Face Token Required |
|
|
| The `dots.ocr` model is **gated** and requires authentication. To use this Space: |
|
|
| 1. **Get a Hugging Face Token**: |
| - Go to https://huggingface.co/settings/tokens |
| - Create a new token with **Read** access |
| |
| 2. **Request Access to the Model**: |
| - Visit https://huggingface.co/rednote-hilab/DotsOCR |
| - Click "Request Access" (if gated) |
| - Wait for approval |
|
|
| 3. **Add Token to Space**: |
| - Go to your Space β Settings |
| - Add a new Secret: |
| - Name: `HF_TOKEN` |
| - Value: Your HF token |
| - Rebuild the Space |
|
|
| π **Full guide**: See `HF_TOKEN_SETUP.md` for detailed instructions |
|
|
| ## Usage |
|
|
| 1. **Upload an Image**: Upload a document image (photo, scan, or screenshot) |
| 2. **Select Prompt Type**: Choose from predefined prompts or use a custom one |
| - **Full Layout + OCR**: Complete analysis with bounding boxes, categories, and text |
| - **OCR Only**: Simple text extraction |
| - **Layout Detection Only**: Just bounding boxes and categories |
| - **Custom**: Write your own prompt |
| 3. **Process**: Click the "Process Document" button |
| 4. **Results**: Get structured JSON output with all detected elements |
|
|
| ## API Usage |
|
|
| You can also use this space via API: |
|
|
| ```python |
| from gradio_client import Client |
| |
| client = Client("YOUR_SPACE_URL") |
| result = client.predict( |
| image="path/to/your/image.jpg", |
| prompt_type="Full Layout + OCR (English)", |
| custom_prompt="", |
| api_name="/predict" |
| ) |
| print(result) |
| ``` |
|
|
| ## Model Information |
|
|
| - **Model**: [rednote-hilab/DotsOCR](https://huggingface.co/rednote-hilab/DotsOCR) |
| - **Parameters**: 1.7B (LLM foundation) |
| - **Performance**: SOTA on OmniDocBench for text, tables, and reading order |
| - **Base Model**: Qwen2.5-VL architecture |
|
|
| ## Limitations |
|
|
| - Best performance on images with resolution under 11,289,600 pixels |
| - May struggle with extremely high character-to-pixel ratios |
| - Complex tables and formulas may not be perfect |
| - Continuous special characters (ellipses, underscores) may cause issues |
|
|
| ## Citation |
|
|
| If you use this model, please cite: |
|
|
| ```bibtex |
| @misc{dots.ocr, |
| title={dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model}, |
| author={Rednote HiLab Team}, |
| year={2025}, |
| url={https://github.com/rednote-hilab/dots.ocr} |
| } |
| ``` |
|
|
| ## Links |
|
|
| - π¦ [GitHub Repository](https://github.com/rednote-hilab/dots.ocr) |
| - π€ [Model on Hugging Face](https://huggingface.co/rednote-hilab/DotsOCR) |
| - π [Blog Post](https://www.xiaohongshu.com/blog) |
|
|
| ## License |
|
|
| MIT License - See the [dots.ocr repository](https://github.com/rednote-hilab/dots.ocr) for details. |
|
|
| ## Acknowledgments |
|
|
| Built with: |
| - [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) |
| - [Gradio](https://gradio.app) |
| - [Hugging Face Transformers](https://huggingface.co/transformers) |
|
|