--- title: Bec Dot.orc Api emoji: 🚀 colorFrom: purple colorTo: red sdk: gradio sdk_version: 6.5.1 app_file: app.py pinned: false license: apache-2.0 --- # Bec Dot.ocr API OCR API powered by [rednote-hilab/dots.ocr](https://huggingface.co/rednote-hilab/dots.ocr) -- a multilingual document-parsing vision-language model. This Space provides both a browser UI and a programmatic API optimized for batch processing. ## Quick start ### 1. Install the client ```bash pip install gradio_client ``` ### 2. Process a single image ```python from gradio_client import Client client = Client("openpecha/bec-dot.orc-api") result = client.predict( "path/to/image.png", # local filepath or URL "Extract the text content from this image.", # prompt api_name="/predict", ) print(result) ``` ### 3. Batch-process many images ```python import os import json from pathlib import Path from gradio_client import Client, handle_file client = Client("openpecha/bec-dot.orc-api") image_dir = Path("images") output_dir = Path("results") output_dir.mkdir(exist_ok=True) prompt = "Extract the text content from this image." for img_path in sorted(image_dir.glob("*.png")): print(f"Processing {img_path.name} ...") result = client.predict( handle_file(str(img_path)), prompt, api_name="/predict", ) out_file = output_dir / f"{img_path.stem}.txt" out_file.write_text(result, encoding="utf-8") print(f" -> saved to {out_file}") ``` > **Tip:** The Space uses queuing (`max_size=20`), so requests are processed > sequentially and will not time out even for large batches. ### 4. Use a custom prompt The default prompt is `"Extract the text content from this image."` You can override it for more specific tasks: ```python # Layout-aware JSON extraction result = client.predict( handle_file("document.png"), """Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox. 1. Bbox format: [x1, y1, x2, y2] 2. Layout Categories: ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title']. 3. Text Extraction & Formatting Rules: - Picture: omit the text field. - Formula: format as LaTeX. - Table: format as HTML. - All Others: format as Markdown. 4. Output the original text with no translation. 5. Sort all layout elements in human reading order. 6. Final Output: a single JSON object.""", api_name="/predict", ) ``` ## API reference | Endpoint | Method | Parameters | Returns | |---|---|---|---| | `/predict` | POST | `image` (filepath/URL), `prompt` (string) | Raw text or JSON string | ## Model details - **Model:** [rednote-hilab/dots.ocr](https://huggingface.co/rednote-hilab/dots.ocr) (1.7B LLM, ~3B total) - **Precision:** bfloat16 - **Capabilities:** text extraction, layout detection, table recognition (HTML), formula parsing (LaTeX), multilingual support