Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
metadata
title: Bec Dot.orc Api
emoji: 🚀
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: apache-2.0
Bec Dot.ocr API
OCR API powered by rednote-hilab/dots.ocr -- a multilingual document-parsing vision-language model. This Space provides both a browser UI and a programmatic API optimized for batch processing.
Quick start
1. Install the client
pip install gradio_client
2. Process a single image
from gradio_client import Client
client = Client("openpecha/bec-dot.orc-api")
result = client.predict(
"path/to/image.png", # local filepath or URL
"Extract the text content from this image.", # prompt
api_name="/predict",
)
print(result)
3. Batch-process many images
import os
import json
from pathlib import Path
from gradio_client import Client, handle_file
client = Client("openpecha/bec-dot.orc-api")
image_dir = Path("images")
output_dir = Path("results")
output_dir.mkdir(exist_ok=True)
prompt = "Extract the text content from this image."
for img_path in sorted(image_dir.glob("*.png")):
print(f"Processing {img_path.name} ...")
result = client.predict(
handle_file(str(img_path)),
prompt,
api_name="/predict",
)
out_file = output_dir / f"{img_path.stem}.txt"
out_file.write_text(result, encoding="utf-8")
print(f" -> saved to {out_file}")
Tip: The Space uses queuing (
max_size=20), so requests are processed sequentially and will not time out even for large batches.
4. Use a custom prompt
The default prompt is "Extract the text content from this image." You can
override it for more specific tasks:
# Layout-aware JSON extraction
result = client.predict(
handle_file("document.png"),
"""Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.
1. Bbox format: [x1, y1, x2, y2]
2. Layout Categories: ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title'].
3. Text Extraction & Formatting Rules:
- Picture: omit the text field.
- Formula: format as LaTeX.
- Table: format as HTML.
- All Others: format as Markdown.
4. Output the original text with no translation.
5. Sort all layout elements in human reading order.
6. Final Output: a single JSON object.""",
api_name="/predict",
)
API reference
| Endpoint | Method | Parameters | Returns |
|---|---|---|---|
/predict |
POST | image (filepath/URL), prompt (string) |
Raw text or JSON string |
Model details
- Model: rednote-hilab/dots.ocr (1.7B LLM, ~3B total)
- Precision: bfloat16
- Capabilities: text extraction, layout detection, table recognition (HTML), formula parsing (LaTeX), multilingual support