Spaces:
Sleeping
Sleeping
| title: Unified Document Extraction API | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| app_file: app.py | |
| pinned: false | |
| # π Unified Document Extraction API | |
| **One API, Two Engines: Docling + DocStrange** | |
| Extract structured data from any document using AI-powered engines. | |
| ## Features | |
| - β **Docling** - Advanced document parsing with structure preservation | |
| - β **DocStrange** - GPU-accelerated intelligent document processing | |
| - β **Multiple formats** - PDF, DOCX, XLSX, PPTX, Images, and more | |
| - β **Structured output** - Markdown, JSON, Tables | |
| ## API Endpoints | |
| - `GET /` - Health check | |
| - `GET /engines` - List available engines | |
| - `POST /convert` - Full document conversion | |
| - `POST /convert/markdown` - Markdown only | |
| - `POST /convert/tables` - Tables only | |
| ## Usage | |
| ```bash | |
| # Convert with Docling | |
| curl -X POST "https://YOUR_SPACE.hf.space/convert?engine=docling" \ | |
| -F "file=@document.pdf" | |
| # Convert with DocStrange | |
| curl -X POST "https://YOUR_SPACE.hf.space/convert?engine=docstrange" \ | |
| -F "file=@document.pdf" | |
| ``` | |
| ## Integration | |
| Works with **DataSync** application for ERPNext integration. | |
| ## License | |
| MIT | |