markobinario commited on
Commit
98ac845
·
verified ·
1 Parent(s): bf757f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md CHANGED
@@ -9,4 +9,71 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
9
  pinned: false
10
  ---
11
 
12
+ ## PDF OCR Space (PaddleOCR + PyMuPDF)
13
+
14
+ Gradio Space that extracts text from uploaded PDFs. Each page is rendered with PyMuPDF and recognized by PaddleOCR.
15
+
16
+ ### Features
17
+ - Upload a PDF and extract text
18
+ - Adjustable DPI and optional page limit
19
+ - Multi-language OCR (via PaddleOCR language packs)
20
+ - JSON output including bboxes and confidences per page
21
+
22
+ ### App Structure
23
+ - `app.py`: Gradio UI and prediction logic
24
+ - `requirements.txt`: Python dependencies
25
+ - `runtime.txt`: Python runtime for Hugging Face Spaces
26
+
27
+ ### Deploy on Hugging Face Spaces
28
+ 1. Create a new Space (SDK: Gradio, Python).
29
+ 2. Upload all files in this repo: `app.py`, `requirements.txt`, `runtime.txt`, `README.md`.
30
+ 3. First build will download OCR models; subsequent runs are cached.
31
+
32
+ ### Run Locally
33
+ ```bash
34
+ pip install -r requirements.txt
35
+ python app.py
36
+ ```
37
+ Then open `http://127.0.0.1:7860`.
38
+
39
+ ### API Usage
40
+
41
+ Using `gradio_client` in Python:
42
+ ```python
43
+ from gradio_client import Client
44
+
45
+ client = Client("<your-username>/<your-space-name>")
46
+ result = client.predict(
47
+ pdf_file=("file", "./sample.pdf"),
48
+ dpi=170,
49
+ max_pages=None,
50
+ lang="en",
51
+ api_name="predict"
52
+ )
53
+
54
+ text, json_payload = result
55
+ print(text)
56
+ print(json_payload)
57
+ ```
58
+
59
+ Using Spaces Inference API (curl):
60
+ ```bash
61
+ curl -s -X POST \
62
+ -F "data=@sample.pdf" \
63
+ -F "data=170" \
64
+ -F "data=" \
65
+ -F "data=en" \
66
+ https://<your-username>-<your-space-name>.hf.space/run/predict
67
+ ```
68
+
69
+ Notes:
70
+ - Increase DPI for small text, but it will be slower.
71
+ - Choose the appropriate language model; defaults to `en`.
72
+
73
+ ### License
74
+ MIT
75
+
76
+
77
+
78
+
79
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference