Spaces:
Running on T4
Running on T4
| title: Docling VLM Parser API | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| license: mit | |
| suggested_hardware: a100-large | |
| # Docling VLM Parser API | |
| A FastAPI service that transforms PDFs and images into LLM-ready markdown/JSON using a **hybrid two-pass architecture**: [IBM's Docling](https://github.com/DS4SD/docling) for document structure and [Qwen3-VL-30B-A3B](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct) via [vLLM](https://github.com/vllm-project/vllm) for enhanced text recognition. | |
| ## Features | |
| - **Hybrid Two-Pass Architecture**: Docling Standard Pipeline (Pass 1) + Qwen3-VL VLM OCR (Pass 2) | |
| - **TableFormer ACCURATE**: High-accuracy table structure detection preserved from Docling | |
| - **VLM-Powered OCR**: Qwen3-VL-30B-A3B via vLLM replaces baseline RapidOCR for superior text accuracy | |
| - **OpenCV Preprocessing**: Denoising and CLAHE contrast enhancement for better image quality | |
| - **32+ Language Support**: Multilingual text recognition powered by Qwen3-VL | |
| - **Handwriting Recognition**: Transcribe handwritten text via VLM | |
| - **Image Extraction**: Extract and return all document images | |
| - **Multiple Formats**: Output as markdown or JSON | |
| - **GPU Accelerated**: Dual-process on A100 80GB (vLLM + FastAPI) | |
| ## Architecture | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Docker Container β | |
| β (vllm/vllm-openai:v0.14.1) β | |
| β β | |
| β βββββββββββββββββββββββ ββββββββββββββββββββββββββββββ β | |
| β β vLLM Server :8000 β β FastAPI App :7860 β β | |
| β β Qwen3-VL-30B-A3B ββββββ β β | |
| β β (GPU inference) β β Pass 1: Docling Standard β β | |
| β βββββββββββββββββββββββ β - DocLayNet layout β β | |
| β β - TableFormer ACCURATE β β | |
| β β - RapidOCR baseline β β | |
| β β β β | |
| β β Pass 2: VLM OCR β β | |
| β β - Page images β Qwen3-VL β β | |
| β β - OpenCV preprocessing β β | |
| β β β β | |
| β β Merge: β β | |
| β β - VLM text (primary) β β | |
| β β - TableFormer tables β β | |
| β ββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ## API Endpoints | |
| | Endpoint | Method | Description | | |
| | ------------ | ------ | ----------------------------------------- | | |
| | `/` | GET | Health check (includes vLLM status) | | |
| | `/parse` | POST | Parse uploaded file (multipart/form-data) | | |
| | `/parse/url` | POST | Parse document from URL (JSON body) | | |
| ## Authentication | |
| All `/parse` endpoints require Bearer token authentication. | |
| ``` | |
| Authorization: Bearer YOUR_API_TOKEN | |
| ``` | |
| Set `API_TOKEN` in HF Space Settings > Secrets. | |
| ## Quick Start | |
| ### cURL - File Upload | |
| ```bash | |
| curl -X POST "https://YOUR-SPACE-URL/parse" \ | |
| -H "Authorization: Bearer YOUR_API_TOKEN" \ | |
| -F "file=@document.pdf" \ | |
| -F "output_format=markdown" | |
| ``` | |
| ### cURL - Parse from URL | |
| ```bash | |
| curl -X POST "https://YOUR-SPACE-URL/parse/url" \ | |
| -H "Authorization: Bearer YOUR_API_TOKEN" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"url": "https://example.com/document.pdf", "output_format": "markdown"}' | |
| ``` | |
| ### Python | |
| ```python | |
| import requests | |
| API_URL = "https://YOUR-SPACE-URL" | |
| API_TOKEN = "your_api_token" | |
| headers = {"Authorization": f"Bearer {API_TOKEN}"} | |
| # Option 1: Upload a file | |
| with open("document.pdf", "rb") as f: | |
| response = requests.post( | |
| f"{API_URL}/parse", | |
| headers=headers, | |
| files={"file": ("document.pdf", f, "application/pdf")}, | |
| data={"output_format": "markdown"} | |
| ) | |
| # Option 2: Parse from URL | |
| response = requests.post( | |
| f"{API_URL}/parse/url", | |
| headers=headers, | |
| json={ | |
| "url": "https://example.com/document.pdf", | |
| "output_format": "markdown" | |
| } | |
| ) | |
| result = response.json() | |
| if result["success"]: | |
| print(f"Parsed {result['pages_processed']} pages using {result['vlm_model']}") | |
| print(result["markdown"]) | |
| else: | |
| print(f"Error: {result['error']}") | |
| ``` | |
| ### Python with Images | |
| ```python | |
| import requests | |
| import base64 | |
| import zipfile | |
| import io | |
| API_URL = "https://YOUR-SPACE-URL" | |
| API_TOKEN = "your_api_token" | |
| headers = {"Authorization": f"Bearer {API_TOKEN}"} | |
| # Request with images included | |
| with open("document.pdf", "rb") as f: | |
| response = requests.post( | |
| f"{API_URL}/parse", | |
| headers=headers, | |
| files={"file": ("document.pdf", f, "application/pdf")}, | |
| data={"output_format": "markdown", "include_images": "true"} | |
| ) | |
| result = response.json() | |
| if result["success"]: | |
| print(f"Parsed {result['pages_processed']} pages") | |
| print(result["markdown"]) | |
| # Extract images from ZIP | |
| if result["images_zip"]: | |
| print(f"Extracting {result['image_count']} images...") | |
| zip_bytes = base64.b64decode(result["images_zip"]) | |
| with zipfile.ZipFile(io.BytesIO(zip_bytes), 'r') as zf: | |
| zf.extractall("./extracted_images") | |
| print(f"Images saved to ./extracted_images/") | |
| ``` | |
| ## Request Parameters | |
| ### File Upload (/parse) | |
| | Parameter | Type | Required | Default | Description | | |
| | -------------- | ------ | -------- | ---------- | ---------------------------------------- | | |
| | file | File | Yes | - | PDF or image file | | |
| | output_format | string | No | `markdown` | `markdown` or `json` | | |
| | images_scale | float | No | `2.0` | Image resolution scale (higher = better) | | |
| | start_page | int | No | `0` | Starting page (0-indexed) | | |
| | end_page | int | No | `null` | Ending page (null = all pages) | | |
| | include_images | bool | No | `false` | Include extracted images in response | | |
| ### URL Parsing (/parse/url) | |
| | Parameter | Type | Required | Default | Description | | |
| | -------------- | ------ | -------- | ---------- | ---------------------------------------- | | |
| | url | string | Yes | - | URL to PDF or image | | |
| | output_format | string | No | `markdown` | `markdown` or `json` | | |
| | images_scale | float | No | `2.0` | Image resolution scale (higher = better) | | |
| | start_page | int | No | `0` | Starting page (0-indexed) | | |
| | end_page | int | No | `null` | Ending page (null = all pages) | | |
| | include_images | bool | No | `false` | Include extracted images in response | | |
| ## Response Format | |
| ```json | |
| { | |
| "success": true, | |
| "markdown": "# Document Title\n\nExtracted content...", | |
| "json_content": null, | |
| "images_zip": null, | |
| "image_count": 0, | |
| "error": null, | |
| "pages_processed": 20, | |
| "device_used": "cuda", | |
| "vlm_model": "Qwen/Qwen3-VL-30B-A3B-Instruct" | |
| } | |
| ``` | |
| | Field | Type | Description | | |
| | --------------- | ------- | ---------------------------------------------- | | |
| | success | boolean | Whether parsing succeeded | | |
| | markdown | string | Extracted markdown (if output_format=markdown) | | |
| | json_content | object | Extracted JSON (if output_format=json) | | |
| | images_zip | string | Base64-encoded ZIP file containing all images | | |
| | image_count | int | Number of images in the ZIP file | | |
| | error | string | Error message if failed | | |
| | pages_processed | int | Number of pages processed | | |
| | device_used | string | Device used for processing (cuda, mps, or cpu) | | |
| | vlm_model | string | VLM model used for OCR (e.g. Qwen3-VL-30B-A3B) | | |
| ## Supported File Types | |
| - PDF (.pdf) | |
| - Images (.png, .jpg, .jpeg, .tiff, .bmp) | |
| Maximum file size: 1GB (configurable via `MAX_FILE_SIZE_MB`) | |
| ## Configuration | |
| | Environment Variable | Description | Default | | |
| | ---------------------------- | -------------------------------------- | --------------------------- | | |
| | `API_TOKEN` | **Required.** API authentication token | - | | |
| | `VLM_MODEL` | VLM model for OCR | `Qwen/Qwen3-VL-30B-A3B-Instruct` | | |
| | `VLM_HOST` | vLLM server host | `127.0.0.1` | | |
| | `VLM_PORT` | vLLM server port | `8000` | | |
| | `VLM_GPU_MEMORY_UTILIZATION` | GPU memory fraction for vLLM | `0.85` | | |
| | `VLM_MAX_MODEL_LEN` | Max context length for VLM | `8192` | | |
| | `IMAGES_SCALE` | Default image resolution scale | `2.0` | | |
| | `MAX_FILE_SIZE_MB` | Maximum upload size in MB | `1024` | | |
| ## Logging | |
| View logs in HuggingFace Space > Logs tab: | |
| ``` | |
| 2026-02-04 10:30:00 | INFO | [a1b2c3d4] New parse request received | |
| 2026-02-04 10:30:00 | INFO | [a1b2c3d4] Filename: document.pdf | |
| 2026-02-04 10:30:00 | INFO | [a1b2c3d4] File size: 2.45 MB | |
| 2026-02-04 10:30:15 | INFO | [a1b2c3d4] Pass 1: Docling Standard Pipeline completed in 15.23s | |
| 2026-02-04 10:30:15 | INFO | [a1b2c3d4] TableFormer detected 3 tables | |
| 2026-02-04 10:30:27 | INFO | [a1b2c3d4] Pass 2: VLM OCR completed in 12.00s (20 pages) | |
| 2026-02-04 10:30:27 | INFO | [a1b2c3d4] Hybrid conversion complete: 20 pages, 3 tables, 27.23s total | |
| 2026-02-04 10:30:27 | INFO | [a1b2c3d4] Speed: 0.73 pages/sec | |
| ``` | |
| ## Credits | |
| Built with [Docling](https://github.com/DS4SD/docling) by IBM Research, [Qwen3-VL](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct) by Qwen Team, and [vLLM](https://github.com/vllm-project/vllm). | |