Spaces:

openpecha
/

bec-dot.orc-api

Sleeping

App Files Files Community

bec-dot.orc-api / README.md

ta4tsering

feat: implement dots.ocr API and Gradio interface

0ea2759 21 days ago

preview code

raw

history blame contribute delete

3.05 kB

	---
	title: Bec Dot.orc Api
	emoji: 🚀
	colorFrom: purple
	colorTo: red
	sdk: gradio
	sdk_version: 6.5.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	# Bec Dot.ocr API

	OCR API powered by [rednote-hilab/dots.ocr](https://huggingface.co/rednote-hilab/dots.ocr) -- a multilingual document-parsing vision-language model. This Space provides both a browser UI and a programmatic API optimized for batch processing.

	## Quick start

	### 1. Install the client

	```bash
	pip install gradio_client
	```

	### 2. Process a single image

	```python
	from gradio_client import Client

	client = Client("openpecha/bec-dot.orc-api")

	result = client.predict(
	"path/to/image.png", # local filepath or URL
	"Extract the text content from this image.", # prompt
	api_name="/predict",
	)
	print(result)
	```

	### 3. Batch-process many images

	```python
	import os
	import json
	from pathlib import Path
	from gradio_client import Client, handle_file

	client = Client("openpecha/bec-dot.orc-api")

	image_dir = Path("images")
	output_dir = Path("results")
	output_dir.mkdir(exist_ok=True)

	prompt = "Extract the text content from this image."

	for img_path in sorted(image_dir.glob("*.png")):
	print(f"Processing {img_path.name} ...")
	result = client.predict(
	handle_file(str(img_path)),
	prompt,
	api_name="/predict",
	)
	out_file = output_dir / f"{img_path.stem}.txt"
	out_file.write_text(result, encoding="utf-8")
	print(f" -> saved to {out_file}")
	```

	> Tip: The Space uses queuing (`max_size=20`), so requests are processed
	> sequentially and will not time out even for large batches.

	### 4. Use a custom prompt

	The default prompt is `"Extract the text content from this image."` You can
	override it for more specific tasks:

	```python
	# Layout-aware JSON extraction
	result = client.predict(
	handle_file("document.png"),
	"""Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.

	1. Bbox format: [x1, y1, x2, y2]
	2. Layout Categories: ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title'].
	3. Text Extraction & Formatting Rules:
	- Picture: omit the text field.
	- Formula: format as LaTeX.
	- Table: format as HTML.
	- All Others: format as Markdown.
	4. Output the original text with no translation.
	5. Sort all layout elements in human reading order.
	6. Final Output: a single JSON object.""",
	api_name="/predict",
	)
	```

	## API reference

	\| Endpoint \| Method \| Parameters \| Returns \|
	\|---\|---\|---\|---\|
	\| `/predict` \| POST \| `image` (filepath/URL), `prompt` (string) \| Raw text or JSON string \|

	## Model details

	- Model: [rednote-hilab/dots.ocr](https://huggingface.co/rednote-hilab/dots.ocr) (1.7B LLM, ~3B total)
	- Precision: bfloat16
	- Capabilities: text extraction, layout detection, table recognition (HTML), formula parsing (LaTeX), multilingual support