File size: 3,045 Bytes
e95e9f9
 
 
 
 
 
 
 
 
 
 
 
0ea2759
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
title: Bec Dot.orc Api
emoji: 🚀
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: apache-2.0
---

# Bec Dot.ocr API

OCR API powered by [rednote-hilab/dots.ocr](https://huggingface.co/rednote-hilab/dots.ocr) -- a multilingual document-parsing vision-language model. This Space provides both a browser UI and a programmatic API optimized for batch processing.

## Quick start

### 1. Install the client

```bash
pip install gradio_client
```

### 2. Process a single image

```python
from gradio_client import Client

client = Client("openpecha/bec-dot.orc-api")

result = client.predict(
    "path/to/image.png",                        # local filepath or URL
    "Extract the text content from this image.", # prompt
    api_name="/predict",
)
print(result)
```

### 3. Batch-process many images

```python
import os
import json
from pathlib import Path
from gradio_client import Client, handle_file

client = Client("openpecha/bec-dot.orc-api")

image_dir = Path("images")
output_dir = Path("results")
output_dir.mkdir(exist_ok=True)

prompt = "Extract the text content from this image."

for img_path in sorted(image_dir.glob("*.png")):
    print(f"Processing {img_path.name} ...")
    result = client.predict(
        handle_file(str(img_path)),
        prompt,
        api_name="/predict",
    )
    out_file = output_dir / f"{img_path.stem}.txt"
    out_file.write_text(result, encoding="utf-8")
    print(f"  -> saved to {out_file}")
```

> **Tip:** The Space uses queuing (`max_size=20`), so requests are processed
> sequentially and will not time out even for large batches.

### 4. Use a custom prompt

The default prompt is `"Extract the text content from this image."` You can
override it for more specific tasks:

```python
# Layout-aware JSON extraction
result = client.predict(
    handle_file("document.png"),
    """Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.

1. Bbox format: [x1, y1, x2, y2]
2. Layout Categories: ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title'].
3. Text Extraction & Formatting Rules:
    - Picture: omit the text field.
    - Formula: format as LaTeX.
    - Table: format as HTML.
    - All Others: format as Markdown.
4. Output the original text with no translation.
5. Sort all layout elements in human reading order.
6. Final Output: a single JSON object.""",
    api_name="/predict",
)
```

## API reference

| Endpoint | Method | Parameters | Returns |
|---|---|---|---|
| `/predict` | POST | `image` (filepath/URL), `prompt` (string) | Raw text or JSON string |

## Model details

- **Model:** [rednote-hilab/dots.ocr](https://huggingface.co/rednote-hilab/dots.ocr) (1.7B LLM, ~3B total)
- **Precision:** bfloat16
- **Capabilities:** text extraction, layout detection, table recognition (HTML), formula parsing (LaTeX), multilingual support