--- title: dots.ocr - Multilingual Document OCR emoji: 🔍 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false license: mit short_description: Multilingual document layout parsing with OCR models: - rednote-hilab/DotsOCR tags: - ocr - document-parsing - multilingual - layout-detection - vision-language --- # 🔍 dots.ocr - Multilingual Document Layout Parsing This Hugging Face Space provides an easy-to-use interface for the [dots.ocr](https://github.com/rednote-hilab/dots.ocr) model, a powerful multilingual document parser that unifies layout detection and content recognition. ## Features - **Multilingual Support**: Robust parsing capabilities for multiple languages including low-resource languages - **Layout Detection**: Detects various document elements (Text, Title, Table, Formula, Caption, etc.) - **Reading Order Preservation**: Maintains proper reading order in complex layouts - **Formula Recognition**: Extracts mathematical formulas in LaTeX format - **Table Extraction**: Converts tables to HTML format - **Markdown Output**: Formats regular text as Markdown ## Supported Layout Categories - Caption - Footnote - Formula (LaTeX output) - List-item - Page-footer - Page-header - Picture - Section-header - Table (HTML output) - Text - Title ## ⚠️ Important: Hugging Face Token Required The `dots.ocr` model is **gated** and requires authentication. To use this Space: 1. **Get a Hugging Face Token**: - Go to https://huggingface.co/settings/tokens - Create a new token with **Read** access 2. **Request Access to the Model**: - Visit https://huggingface.co/rednote-hilab/DotsOCR - Click "Request Access" (if gated) - Wait for approval 3. **Add Token to Space**: - Go to your Space → Settings - Add a new Secret: - Name: `HF_TOKEN` - Value: Your HF token - Rebuild the Space 📖 **Full guide**: See `HF_TOKEN_SETUP.md` for detailed instructions ## Usage 1. **Upload an Image**: Upload a document image (photo, scan, or screenshot) 2. **Select Prompt Type**: Choose from predefined prompts or use a custom one - **Full Layout + OCR**: Complete analysis with bounding boxes, categories, and text - **OCR Only**: Simple text extraction - **Layout Detection Only**: Just bounding boxes and categories - **Custom**: Write your own prompt 3. **Process**: Click the "Process Document" button 4. **Results**: Get structured JSON output with all detected elements ## API Usage You can also use this space via API: ```python from gradio_client import Client client = Client("YOUR_SPACE_URL") result = client.predict( image="path/to/your/image.jpg", prompt_type="Full Layout + OCR (English)", custom_prompt="", api_name="/predict" ) print(result) ``` ## Model Information - **Model**: [rednote-hilab/DotsOCR](https://huggingface.co/rednote-hilab/DotsOCR) - **Parameters**: 1.7B (LLM foundation) - **Performance**: SOTA on OmniDocBench for text, tables, and reading order - **Base Model**: Qwen2.5-VL architecture ## Limitations - Best performance on images with resolution under 11,289,600 pixels - May struggle with extremely high character-to-pixel ratios - Complex tables and formulas may not be perfect - Continuous special characters (ellipses, underscores) may cause issues ## Citation If you use this model, please cite: ```bibtex @misc{dots.ocr, title={dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model}, author={Rednote HiLab Team}, year={2025}, url={https://github.com/rednote-hilab/dots.ocr} } ``` ## Links - 📦 [GitHub Repository](https://github.com/rednote-hilab/dots.ocr) - 🤗 [Model on Hugging Face](https://huggingface.co/rednote-hilab/DotsOCR) - 📝 [Blog Post](https://www.xiaohongshu.com/blog) ## License MIT License - See the [dots.ocr repository](https://github.com/rednote-hilab/dots.ocr) for details. ## Acknowledgments Built with: - [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) - [Gradio](https://gradio.app) - [Hugging Face Transformers](https://huggingface.co/transformers)