Spaces:

upprize
/

ocr

Running on Zero

App Files Files Community

ocr / README.md

upprize

700ddbf 6 months ago

preview code

raw

history blame contribute delete

4.12 kB

	---
	title: dots.ocr - Multilingual Document OCR
	emoji: 🔍
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app.py
	pinned: false
	license: mit
	short_description: Multilingual document layout parsing with OCR
	models:
	- rednote-hilab/DotsOCR
	tags:
	- ocr
	- document-parsing
	- multilingual
	- layout-detection
	- vision-language
	---

	# 🔍 dots.ocr - Multilingual Document Layout Parsing

	This Hugging Face Space provides an easy-to-use interface for the [dots.ocr](https://github.com/rednote-hilab/dots.ocr) model, a powerful multilingual document parser that unifies layout detection and content recognition.

	## Features

	- Multilingual Support: Robust parsing capabilities for multiple languages including low-resource languages
	- Layout Detection: Detects various document elements (Text, Title, Table, Formula, Caption, etc.)
	- Reading Order Preservation: Maintains proper reading order in complex layouts
	- Formula Recognition: Extracts mathematical formulas in LaTeX format
	- Table Extraction: Converts tables to HTML format
	- Markdown Output: Formats regular text as Markdown

	## Supported Layout Categories

	- Caption
	- Footnote
	- Formula (LaTeX output)
	- List-item
	- Page-footer
	- Page-header
	- Picture
	- Section-header
	- Table (HTML output)
	- Text
	- Title

	## ⚠️ Important: Hugging Face Token Required

	The `dots.ocr` model is gated and requires authentication. To use this Space:

	1. Get a Hugging Face Token:
	- Go to https://huggingface.co/settings/tokens
	- Create a new token with Read access

	2. Request Access to the Model:
	- Visit https://huggingface.co/rednote-hilab/DotsOCR
	- Click "Request Access" (if gated)
	- Wait for approval

	3. Add Token to Space:
	- Go to your Space → Settings
	- Add a new Secret:
	- Name: `HF_TOKEN`
	- Value: Your HF token
	- Rebuild the Space

	📖 Full guide: See `HF_TOKEN_SETUP.md` for detailed instructions

	## Usage

	1. Upload an Image: Upload a document image (photo, scan, or screenshot)
	2. Select Prompt Type: Choose from predefined prompts or use a custom one
	- Full Layout + OCR: Complete analysis with bounding boxes, categories, and text
	- OCR Only: Simple text extraction
	- Layout Detection Only: Just bounding boxes and categories
	- Custom: Write your own prompt
	3. Process: Click the "Process Document" button
	4. Results: Get structured JSON output with all detected elements

	## API Usage

	You can also use this space via API:

	```python
	from gradio_client import Client

	client = Client("YOUR_SPACE_URL")
	result = client.predict(
	image="path/to/your/image.jpg",
	prompt_type="Full Layout + OCR (English)",
	custom_prompt="",
	api_name="/predict"
	)
	print(result)
	```

	## Model Information

	- Model: [rednote-hilab/DotsOCR](https://huggingface.co/rednote-hilab/DotsOCR)
	- Parameters: 1.7B (LLM foundation)
	- Performance: SOTA on OmniDocBench for text, tables, and reading order
	- Base Model: Qwen2.5-VL architecture

	## Limitations

	- Best performance on images with resolution under 11,289,600 pixels
	- May struggle with extremely high character-to-pixel ratios
	- Complex tables and formulas may not be perfect
	- Continuous special characters (ellipses, underscores) may cause issues

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{dots.ocr,
	title={dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model},
	author={Rednote HiLab Team},
	year={2025},
	url={https://github.com/rednote-hilab/dots.ocr}
	}
	```

	## Links

	- 📦 [GitHub Repository](https://github.com/rednote-hilab/dots.ocr)
	- 🤗 [Model on Hugging Face](https://huggingface.co/rednote-hilab/DotsOCR)
	- 📝 [Blog Post](https://www.xiaohongshu.com/blog)

	## License

	MIT License - See the [dots.ocr repository](https://github.com/rednote-hilab/dots.ocr) for details.

	## Acknowledgments

	Built with:
	- [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL)
	- [Gradio](https://gradio.app)
	- [Hugging Face Transformers](https://huggingface.co/transformers)