ocr / README.md
upprize's picture
.
700ddbf
---
title: dots.ocr - Multilingual Document OCR
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: Multilingual document layout parsing with OCR
models:
- rednote-hilab/DotsOCR
tags:
- ocr
- document-parsing
- multilingual
- layout-detection
- vision-language
---
# πŸ” dots.ocr - Multilingual Document Layout Parsing
This Hugging Face Space provides an easy-to-use interface for the [dots.ocr](https://github.com/rednote-hilab/dots.ocr) model, a powerful multilingual document parser that unifies layout detection and content recognition.
## Features
- **Multilingual Support**: Robust parsing capabilities for multiple languages including low-resource languages
- **Layout Detection**: Detects various document elements (Text, Title, Table, Formula, Caption, etc.)
- **Reading Order Preservation**: Maintains proper reading order in complex layouts
- **Formula Recognition**: Extracts mathematical formulas in LaTeX format
- **Table Extraction**: Converts tables to HTML format
- **Markdown Output**: Formats regular text as Markdown
## Supported Layout Categories
- Caption
- Footnote
- Formula (LaTeX output)
- List-item
- Page-footer
- Page-header
- Picture
- Section-header
- Table (HTML output)
- Text
- Title
## ⚠️ Important: Hugging Face Token Required
The `dots.ocr` model is **gated** and requires authentication. To use this Space:
1. **Get a Hugging Face Token**:
- Go to https://huggingface.co/settings/tokens
- Create a new token with **Read** access
2. **Request Access to the Model**:
- Visit https://huggingface.co/rednote-hilab/DotsOCR
- Click "Request Access" (if gated)
- Wait for approval
3. **Add Token to Space**:
- Go to your Space β†’ Settings
- Add a new Secret:
- Name: `HF_TOKEN`
- Value: Your HF token
- Rebuild the Space
πŸ“– **Full guide**: See `HF_TOKEN_SETUP.md` for detailed instructions
## Usage
1. **Upload an Image**: Upload a document image (photo, scan, or screenshot)
2. **Select Prompt Type**: Choose from predefined prompts or use a custom one
- **Full Layout + OCR**: Complete analysis with bounding boxes, categories, and text
- **OCR Only**: Simple text extraction
- **Layout Detection Only**: Just bounding boxes and categories
- **Custom**: Write your own prompt
3. **Process**: Click the "Process Document" button
4. **Results**: Get structured JSON output with all detected elements
## API Usage
You can also use this space via API:
```python
from gradio_client import Client
client = Client("YOUR_SPACE_URL")
result = client.predict(
image="path/to/your/image.jpg",
prompt_type="Full Layout + OCR (English)",
custom_prompt="",
api_name="/predict"
)
print(result)
```
## Model Information
- **Model**: [rednote-hilab/DotsOCR](https://huggingface.co/rednote-hilab/DotsOCR)
- **Parameters**: 1.7B (LLM foundation)
- **Performance**: SOTA on OmniDocBench for text, tables, and reading order
- **Base Model**: Qwen2.5-VL architecture
## Limitations
- Best performance on images with resolution under 11,289,600 pixels
- May struggle with extremely high character-to-pixel ratios
- Complex tables and formulas may not be perfect
- Continuous special characters (ellipses, underscores) may cause issues
## Citation
If you use this model, please cite:
```bibtex
@misc{dots.ocr,
title={dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model},
author={Rednote HiLab Team},
year={2025},
url={https://github.com/rednote-hilab/dots.ocr}
}
```
## Links
- πŸ“¦ [GitHub Repository](https://github.com/rednote-hilab/dots.ocr)
- πŸ€— [Model on Hugging Face](https://huggingface.co/rednote-hilab/DotsOCR)
- πŸ“ [Blog Post](https://www.xiaohongshu.com/blog)
## License
MIT License - See the [dots.ocr repository](https://github.com/rednote-hilab/dots.ocr) for details.
## Acknowledgments
Built with:
- [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL)
- [Gradio](https://gradio.app)
- [Hugging Face Transformers](https://huggingface.co/transformers)