docling-processor / README.md
arjunbhargav212's picture
Upload 4 files
dc23f92 verified
---
title: Unified Document Extraction API
emoji: πŸ“„
colorFrom: blue
colorTo: indigo
sdk: docker
app_file: app.py
pinned: false
---
# πŸš€ Unified Document Extraction API
**One API, Two Engines: Docling + DocStrange**
Extract structured data from any document using AI-powered engines.
## Features
- βœ… **Docling** - Advanced document parsing with structure preservation
- βœ… **DocStrange** - GPU-accelerated intelligent document processing
- βœ… **Multiple formats** - PDF, DOCX, XLSX, PPTX, Images, and more
- βœ… **Structured output** - Markdown, JSON, Tables
## API Endpoints
- `GET /` - Health check
- `GET /engines` - List available engines
- `POST /convert` - Full document conversion
- `POST /convert/markdown` - Markdown only
- `POST /convert/tables` - Tables only
## Usage
```bash
# Convert with Docling
curl -X POST "https://YOUR_SPACE.hf.space/convert?engine=docling" \
-F "file=@document.pdf"
# Convert with DocStrange
curl -X POST "https://YOUR_SPACE.hf.space/convert?engine=docstrange" \
-F "file=@document.pdf"
```
## Integration
Works with **DataSync** application for ERPNext integration.
## License
MIT