Spaces:
Sleeping
Sleeping
| title: Bec Dot.orc Api | |
| emoji: ๐ | |
| colorFrom: purple | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 6.5.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| # Bec Dot.ocr API | |
| OCR API powered by [rednote-hilab/dots.ocr](https://huggingface.co/rednote-hilab/dots.ocr) -- a multilingual document-parsing vision-language model. This Space provides both a browser UI and a programmatic API optimized for batch processing. | |
| ## Quick start | |
| ### 1. Install the client | |
| ```bash | |
| pip install gradio_client | |
| ``` | |
| ### 2. Process a single image | |
| ```python | |
| from gradio_client import Client | |
| client = Client("openpecha/bec-dot.orc-api") | |
| result = client.predict( | |
| "path/to/image.png", # local filepath or URL | |
| "Extract the text content from this image.", # prompt | |
| api_name="/predict", | |
| ) | |
| print(result) | |
| ``` | |
| ### 3. Batch-process many images | |
| ```python | |
| import os | |
| import json | |
| from pathlib import Path | |
| from gradio_client import Client, handle_file | |
| client = Client("openpecha/bec-dot.orc-api") | |
| image_dir = Path("images") | |
| output_dir = Path("results") | |
| output_dir.mkdir(exist_ok=True) | |
| prompt = "Extract the text content from this image." | |
| for img_path in sorted(image_dir.glob("*.png")): | |
| print(f"Processing {img_path.name} ...") | |
| result = client.predict( | |
| handle_file(str(img_path)), | |
| prompt, | |
| api_name="/predict", | |
| ) | |
| out_file = output_dir / f"{img_path.stem}.txt" | |
| out_file.write_text(result, encoding="utf-8") | |
| print(f" -> saved to {out_file}") | |
| ``` | |
| > **Tip:** The Space uses queuing (`max_size=20`), so requests are processed | |
| > sequentially and will not time out even for large batches. | |
| ### 4. Use a custom prompt | |
| The default prompt is `"Extract the text content from this image."` You can | |
| override it for more specific tasks: | |
| ```python | |
| # Layout-aware JSON extraction | |
| result = client.predict( | |
| handle_file("document.png"), | |
| """Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox. | |
| 1. Bbox format: [x1, y1, x2, y2] | |
| 2. Layout Categories: ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title']. | |
| 3. Text Extraction & Formatting Rules: | |
| - Picture: omit the text field. | |
| - Formula: format as LaTeX. | |
| - Table: format as HTML. | |
| - All Others: format as Markdown. | |
| 4. Output the original text with no translation. | |
| 5. Sort all layout elements in human reading order. | |
| 6. Final Output: a single JSON object.""", | |
| api_name="/predict", | |
| ) | |
| ``` | |
| ## API reference | |
| | Endpoint | Method | Parameters | Returns | | |
| |---|---|---|---| | |
| | `/predict` | POST | `image` (filepath/URL), `prompt` (string) | Raw text or JSON string | | |
| ## Model details | |
| - **Model:** [rednote-hilab/dots.ocr](https://huggingface.co/rednote-hilab/dots.ocr) (1.7B LLM, ~3B total) | |
| - **Precision:** bfloat16 | |
| - **Capabilities:** text extraction, layout detection, table recognition (HTML), formula parsing (LaTeX), multilingual support | |