docling-parser / README.md
sidoutcome's picture
feat: upgrade to Qwen3-VL-30B-A3B, simplify auth, fix redirects
922ba62
---
title: Docling VLM Parser API
emoji: πŸ“„
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
suggested_hardware: a100-large
---
# Docling VLM Parser API
A FastAPI service that transforms PDFs and images into LLM-ready markdown/JSON using a **hybrid two-pass architecture**: [IBM's Docling](https://github.com/DS4SD/docling) for document structure and [Qwen3-VL-30B-A3B](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct) via [vLLM](https://github.com/vllm-project/vllm) for enhanced text recognition.
## Features
- **Hybrid Two-Pass Architecture**: Docling Standard Pipeline (Pass 1) + Qwen3-VL VLM OCR (Pass 2)
- **TableFormer ACCURATE**: High-accuracy table structure detection preserved from Docling
- **VLM-Powered OCR**: Qwen3-VL-30B-A3B via vLLM replaces baseline RapidOCR for superior text accuracy
- **OpenCV Preprocessing**: Denoising and CLAHE contrast enhancement for better image quality
- **32+ Language Support**: Multilingual text recognition powered by Qwen3-VL
- **Handwriting Recognition**: Transcribe handwritten text via VLM
- **Image Extraction**: Extract and return all document images
- **Multiple Formats**: Output as markdown or JSON
- **GPU Accelerated**: Dual-process on A100 80GB (vLLM + FastAPI)
## Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Docker Container β”‚
β”‚ (vllm/vllm-openai:v0.14.1) β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ vLLM Server :8000 β”‚ β”‚ FastAPI App :7860 β”‚ β”‚
β”‚ β”‚ Qwen3-VL-30B-A3B │◄───│ β”‚ β”‚
β”‚ β”‚ (GPU inference) β”‚ β”‚ Pass 1: Docling Standard β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ - DocLayNet layout β”‚ β”‚
β”‚ β”‚ - TableFormer ACCURATE β”‚ β”‚
β”‚ β”‚ - RapidOCR baseline β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ Pass 2: VLM OCR β”‚ β”‚
β”‚ β”‚ - Page images β†’ Qwen3-VL β”‚ β”‚
β”‚ β”‚ - OpenCV preprocessing β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ Merge: β”‚ β”‚
β”‚ β”‚ - VLM text (primary) β”‚ β”‚
β”‚ β”‚ - TableFormer tables β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## API Endpoints
| Endpoint | Method | Description |
| ------------ | ------ | ----------------------------------------- |
| `/` | GET | Health check (includes vLLM status) |
| `/parse` | POST | Parse uploaded file (multipart/form-data) |
| `/parse/url` | POST | Parse document from URL (JSON body) |
## Authentication
All `/parse` endpoints require Bearer token authentication.
```
Authorization: Bearer YOUR_API_TOKEN
```
Set `API_TOKEN` in HF Space Settings > Secrets.
## Quick Start
### cURL - File Upload
```bash
curl -X POST "https://YOUR-SPACE-URL/parse" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-F "file=@document.pdf" \
-F "output_format=markdown"
```
### cURL - Parse from URL
```bash
curl -X POST "https://YOUR-SPACE-URL/parse/url" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/document.pdf", "output_format": "markdown"}'
```
### Python
```python
import requests
API_URL = "https://YOUR-SPACE-URL"
API_TOKEN = "your_api_token"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
# Option 1: Upload a file
with open("document.pdf", "rb") as f:
response = requests.post(
f"{API_URL}/parse",
headers=headers,
files={"file": ("document.pdf", f, "application/pdf")},
data={"output_format": "markdown"}
)
# Option 2: Parse from URL
response = requests.post(
f"{API_URL}/parse/url",
headers=headers,
json={
"url": "https://example.com/document.pdf",
"output_format": "markdown"
}
)
result = response.json()
if result["success"]:
print(f"Parsed {result['pages_processed']} pages using {result['vlm_model']}")
print(result["markdown"])
else:
print(f"Error: {result['error']}")
```
### Python with Images
```python
import requests
import base64
import zipfile
import io
API_URL = "https://YOUR-SPACE-URL"
API_TOKEN = "your_api_token"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
# Request with images included
with open("document.pdf", "rb") as f:
response = requests.post(
f"{API_URL}/parse",
headers=headers,
files={"file": ("document.pdf", f, "application/pdf")},
data={"output_format": "markdown", "include_images": "true"}
)
result = response.json()
if result["success"]:
print(f"Parsed {result['pages_processed']} pages")
print(result["markdown"])
# Extract images from ZIP
if result["images_zip"]:
print(f"Extracting {result['image_count']} images...")
zip_bytes = base64.b64decode(result["images_zip"])
with zipfile.ZipFile(io.BytesIO(zip_bytes), 'r') as zf:
zf.extractall("./extracted_images")
print(f"Images saved to ./extracted_images/")
```
## Request Parameters
### File Upload (/parse)
| Parameter | Type | Required | Default | Description |
| -------------- | ------ | -------- | ---------- | ---------------------------------------- |
| file | File | Yes | - | PDF or image file |
| output_format | string | No | `markdown` | `markdown` or `json` |
| images_scale | float | No | `2.0` | Image resolution scale (higher = better) |
| start_page | int | No | `0` | Starting page (0-indexed) |
| end_page | int | No | `null` | Ending page (null = all pages) |
| include_images | bool | No | `false` | Include extracted images in response |
### URL Parsing (/parse/url)
| Parameter | Type | Required | Default | Description |
| -------------- | ------ | -------- | ---------- | ---------------------------------------- |
| url | string | Yes | - | URL to PDF or image |
| output_format | string | No | `markdown` | `markdown` or `json` |
| images_scale | float | No | `2.0` | Image resolution scale (higher = better) |
| start_page | int | No | `0` | Starting page (0-indexed) |
| end_page | int | No | `null` | Ending page (null = all pages) |
| include_images | bool | No | `false` | Include extracted images in response |
## Response Format
```json
{
"success": true,
"markdown": "# Document Title\n\nExtracted content...",
"json_content": null,
"images_zip": null,
"image_count": 0,
"error": null,
"pages_processed": 20,
"device_used": "cuda",
"vlm_model": "Qwen/Qwen3-VL-30B-A3B-Instruct"
}
```
| Field | Type | Description |
| --------------- | ------- | ---------------------------------------------- |
| success | boolean | Whether parsing succeeded |
| markdown | string | Extracted markdown (if output_format=markdown) |
| json_content | object | Extracted JSON (if output_format=json) |
| images_zip | string | Base64-encoded ZIP file containing all images |
| image_count | int | Number of images in the ZIP file |
| error | string | Error message if failed |
| pages_processed | int | Number of pages processed |
| device_used | string | Device used for processing (cuda, mps, or cpu) |
| vlm_model | string | VLM model used for OCR (e.g. Qwen3-VL-30B-A3B) |
## Supported File Types
- PDF (.pdf)
- Images (.png, .jpg, .jpeg, .tiff, .bmp)
Maximum file size: 1GB (configurable via `MAX_FILE_SIZE_MB`)
## Configuration
| Environment Variable | Description | Default |
| ---------------------------- | -------------------------------------- | --------------------------- |
| `API_TOKEN` | **Required.** API authentication token | - |
| `VLM_MODEL` | VLM model for OCR | `Qwen/Qwen3-VL-30B-A3B-Instruct` |
| `VLM_HOST` | vLLM server host | `127.0.0.1` |
| `VLM_PORT` | vLLM server port | `8000` |
| `VLM_GPU_MEMORY_UTILIZATION` | GPU memory fraction for vLLM | `0.85` |
| `VLM_MAX_MODEL_LEN` | Max context length for VLM | `8192` |
| `IMAGES_SCALE` | Default image resolution scale | `2.0` |
| `MAX_FILE_SIZE_MB` | Maximum upload size in MB | `1024` |
## Logging
View logs in HuggingFace Space > Logs tab:
```
2026-02-04 10:30:00 | INFO | [a1b2c3d4] New parse request received
2026-02-04 10:30:00 | INFO | [a1b2c3d4] Filename: document.pdf
2026-02-04 10:30:00 | INFO | [a1b2c3d4] File size: 2.45 MB
2026-02-04 10:30:15 | INFO | [a1b2c3d4] Pass 1: Docling Standard Pipeline completed in 15.23s
2026-02-04 10:30:15 | INFO | [a1b2c3d4] TableFormer detected 3 tables
2026-02-04 10:30:27 | INFO | [a1b2c3d4] Pass 2: VLM OCR completed in 12.00s (20 pages)
2026-02-04 10:30:27 | INFO | [a1b2c3d4] Hybrid conversion complete: 20 pages, 3 tables, 27.23s total
2026-02-04 10:30:27 | INFO | [a1b2c3d4] Speed: 0.73 pages/sec
```
## Credits
Built with [Docling](https://github.com/DS4SD/docling) by IBM Research, [Qwen3-VL](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct) by Qwen Team, and [vLLM](https://github.com/vllm-project/vllm).