Spaces:
Sleeping
Sleeping
| title: OCRapi | |
| emoji: 🐠 | |
| colorFrom: gray | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: false | |
| ## PDF OCR Space (PaddleOCR + PyMuPDF) | |
| Gradio Space that extracts text from uploaded PDFs. Each page is rendered with PyMuPDF and recognized by PaddleOCR. | |
| ### Features | |
| - Upload a PDF and extract text | |
| - Adjustable DPI and optional page limit | |
| - Multi-language OCR (via PaddleOCR language packs) | |
| - JSON output including bboxes and confidences per page | |
| ### App Structure | |
| - `app.py`: Gradio UI and prediction logic | |
| - `requirements.txt`: Python dependencies | |
| - `runtime.txt`: Python runtime for Hugging Face Spaces | |
| ### Deploy on Hugging Face Spaces | |
| 1. Create a new Space (SDK: Gradio, Python). | |
| 2. Upload all files in this repo: `app.py`, `requirements.txt`, `runtime.txt`, `README.md`. | |
| 3. First build will download OCR models; subsequent runs are cached. | |
| ### Run Locally | |
| ```bash | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| Then open `http://127.0.0.1:7860`. | |
| ### API Usage | |
| Using `gradio_client` in Python: | |
| ```python | |
| from gradio_client import Client | |
| client = Client("<your-username>/<your-space-name>") | |
| result = client.predict( | |
| pdf_file=("file", "./sample.pdf"), | |
| dpi=170, | |
| max_pages=None, | |
| lang="en", | |
| api_name="predict" | |
| ) | |
| text, json_payload = result | |
| print(text) | |
| print(json_payload) | |
| ``` | |
| Using Spaces Inference API (curl): | |
| ```bash | |
| curl -s -X POST \ | |
| -F "data=@sample.pdf" \ | |
| -F "data=170" \ | |
| -F "data=" \ | |
| -F "data=en" \ | |
| https://<your-username>-<your-space-name>.hf.space/run/predict | |
| ``` | |
| Notes: | |
| - Increase DPI for small text, but it will be slower. | |
| - Choose the appropriate language model; defaults to `en`. | |
| ### License | |
| MIT | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |