Spaces:
Running on T4
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Docling Parser (v2.0.0) - A Hugging Face Spaces API service using a hybrid two-pass VLM architecture for PDF/document parsing. Pass 1 runs Docling's standard pipeline (DocLayNet layout + TableFormer ACCURATE + RapidOCR baseline). Pass 2 sends full page images to Qwen3-VL-30B-A3B via vLLM for enhanced text recognition. The merge step preserves TableFormer tables while replacing RapidOCR text with VLM output. Includes OpenCV preprocessing (denoise, CLAHE contrast enhancement). API endpoints are protected by Bearer token authentication.
Architecture
hf_docling_parser/
βββ app.py # FastAPI + hybrid two-pass parsing (v2.0.0)
βββ start.sh # Startup script (vLLM + FastAPI dual-process)
βββ Dockerfile # vLLM base image, Qwen3-VL pre-downloaded
βββ requirements.txt # Python deps (docling, opencv, pdf2image, etc.)
βββ README.md # HF Spaces metadata
βββ CLAUDE.md # Claude Code development guide
βββ .gitignore # Git ignore patterns
Dual-process Docker architecture: start.sh launches vLLM on port 8000 (GPU model serving) and FastAPI on port 7860 (API). Base image: vllm/vllm-openai:v0.14.1.
Common Commands
# Build and test Docker locally (requires A100 GPU)
docker build --shm-size 32g -t hf-docling .
docker run --gpus all --shm-size 32g -p 7860:7860 -e API_TOKEN=test hf-docling
# Test the API locally
curl -X POST "http://localhost:7860/parse" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-F "file=@document.pdf" \
-F "output_format=markdown"
Note: Local dev without Docker is not practical β the hybrid pipeline requires vLLM + Qwen3-VL-30B-A3B running on an A100 GPU.
Deploying to Hugging Face Spaces
- Space URL: https://huggingface.co/spaces/outcomelabs/docling-parser
- API URL: https://outcomelabs-docling-parser.hf.space
Push New Code
Since this lives in a monorepo, deploy by cloning the HF repo, copying files, and pushing:
# Clone HF Space repo to temp directory
git clone https://huggingface.co/spaces/outcomelabs/docling-parser /tmp/hf-docling-deploy
# Copy updated files from monorepo
cp apps/hf_docling_parser/{app.py,Dockerfile,README.md,requirements.txt,start.sh,.gitignore} /tmp/hf-docling-deploy/
# Commit and push
cd /tmp/hf-docling-deploy
git add -A
git commit -m "feat: description of changes"
git push
# Clean up
rm -rf /tmp/hf-docling-deploy
Requires HF CLI auth: huggingface-cli login (logged in as sidoutcome / org outcomelabs).
Settings (configure in HF web UI)
- Hardware: Nvidia A100 Large 80GB ($2.50/hr) β vLLM requires GPU
- Sleep time: 1 hour (auto-shutdown after 60 min idle)
- Secrets:
API_TOKEN(required for API authentication)
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Health check and API info |
/parse |
POST | Parse a document (PDF/image) to markdown/JSON |
/parse/url |
POST | Parse a document from URL |
/docs |
GET | OpenAPI documentation (Swagger UI) |
Key Dependencies
- docling: IBM's document parsing library with TableFormer
- fastapi: API framework
- python-multipart: File upload handling
- uvicorn: ASGI server
- httpx: HTTP client for URL parsing
- pydantic: Request/response validation
- opencv-python-headless: Image preprocessing (denoise, CLAHE)
- pdf2image: PDF page to image conversion for VLM
- huggingface-hub: Model/space utilities
Note: vLLM and PyTorch are provided by the base Docker image (
vllm/vllm-openai:v0.14.1), not inrequirements.txt.
Environment Variables
| Variable | Description | Default |
|---|---|---|
API_TOKEN |
Required. Secret token for API authentication | - |
MAX_FILE_SIZE_MB |
Maximum upload file size in MB | 1024 |
IMAGES_SCALE |
Image resolution scale for page rendering | 2.0 |
VLM_MODEL |
VLM model for text recognition pass | Qwen/Qwen3-VL-30B-A3B-Instruct |
VLM_HOST |
vLLM server host | 127.0.0.1 |
VLM_PORT |
vLLM server port | 8000 |
VLM_GPU_MEMORY_UTILIZATION |
Fraction of GPU memory for vLLM | 0.85 |
VLM_MAX_MODEL_LEN |
Max sequence length for vLLM | 8192 |
Testing
Bruno collection: Open
docs/api-collections/in Bruno for ready-to-use requests with local/production environments.
Note: Testing requires an A100 GPU with vLLM running. Use the Docker container for testing.
Test with curl
# Test /parse endpoint
curl -X POST "http://localhost:7860/parse" \
-H "Authorization: Bearer test" \
-F "file=@sample.pdf" \
-F "output_format=markdown"
# Test with images
curl -X POST "http://localhost:7860/parse" \
-H "Authorization: Bearer test" \
-F "file=@sample.pdf" \
-F "output_format=markdown" \
-F "include_images=true"
# Test /parse/url endpoint
curl -X POST "http://localhost:7860/parse/url" \
-H "Authorization: Bearer test" \
-H "Content-Type: application/json" \
-d '{"url": "https://arxiv.org/pdf/2408.09869", "output_format": "markdown"}'
Test with Python
import httpx
API_URL = "http://localhost:7860"
API_TOKEN = "test"
with open("sample.pdf", "rb") as f:
response = httpx.post(
f"{API_URL}/parse",
headers={"Authorization": f"Bearer {API_TOKEN}"},
files={"file": ("sample.pdf", f, "application/pdf")},
data={"output_format": "markdown"},
)
print(response.json())
Logging & Monitoring
The API provides comprehensive logging:
- Request IDs: Each request gets a unique 8-char ID
- Startup logs: Device info, GPU name, configuration
- Request logs: File size, type, processing time, pages/sec
- Docling output: Conversion progress and timing
- Error tracking: Full exception details with context
Comparison with MinerU
| Feature | Docling (Hybrid VLM) | MinerU |
|---|---|---|
| Maintainer | IBM Research + Qwen3-VL | OpenDataLab |
| Table Detection | TableFormer (built-in) | Multiple backends |
| OCR | Qwen3-VL-30B-A3B via vLLM | Built-in |
| VLM Support | Hybrid (Standard + VLM two-pass) | Hybrid backend |
| License | MIT | AGPL-3.0 |
| GPU Memory | ~24GB (vLLM + Docling) | ~6-10GB (pipeline) |
| Primary Use Case | Enterprise documents | General PDF parsing |
Workflow Orchestration, Task Management & Core Principles
See root
CLAUDE.mdfor full Workflow Orchestration (plan mode, subagents, self-improvement, verification, elegance, bug fixing), Task Management, and Core Principles. Files:<workspace-root>/tasks/todo.md,<workspace-root>/tasks/lessons.md.