docling-parser / CLAUDE.md
Ibad ur Rehman
feat: deploy docling first parser
74cacc0

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Docling Parser (v2.0.0) - A Hugging Face Spaces API service using a hybrid two-pass VLM architecture for PDF/document parsing. Pass 1 runs Docling's standard pipeline (DocLayNet layout + TableFormer ACCURATE + RapidOCR baseline). Pass 2 sends full page images to Qwen3-VL-30B-A3B via vLLM for enhanced text recognition. The merge step preserves TableFormer tables while replacing RapidOCR text with VLM output. Includes OpenCV preprocessing (denoise, CLAHE contrast enhancement). API endpoints are protected by Bearer token authentication.

Architecture

hf_docling_parser/
β”œβ”€β”€ app.py              # FastAPI + hybrid two-pass parsing (v2.0.0)
β”œβ”€β”€ start.sh            # Startup script (vLLM + FastAPI dual-process)
β”œβ”€β”€ Dockerfile          # vLLM base image, Qwen3-VL pre-downloaded
β”œβ”€β”€ requirements.txt    # Python deps (docling, opencv, pdf2image, etc.)
β”œβ”€β”€ README.md           # HF Spaces metadata
β”œβ”€β”€ CLAUDE.md           # Claude Code development guide
└── .gitignore          # Git ignore patterns

Dual-process Docker architecture: start.sh launches vLLM on port 8000 (GPU model serving) and FastAPI on port 7860 (API). Base image: vllm/vllm-openai:v0.14.1.

Common Commands

# Build and test Docker locally (requires A100 GPU)
docker build --shm-size 32g -t hf-docling .
docker run --gpus all --shm-size 32g -p 7860:7860 -e API_TOKEN=test hf-docling

# Test the API locally
curl -X POST "http://localhost:7860/parse" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -F "file=@document.pdf" \
  -F "output_format=markdown"

Note: Local dev without Docker is not practical β€” the hybrid pipeline requires vLLM + Qwen3-VL-30B-A3B running on an A100 GPU.

Deploying to Hugging Face Spaces

Push New Code

Since this lives in a monorepo, deploy by cloning the HF repo, copying files, and pushing:

# Clone HF Space repo to temp directory
git clone https://huggingface.co/spaces/outcomelabs/docling-parser /tmp/hf-docling-deploy

# Copy updated files from monorepo
cp apps/hf_docling_parser/{app.py,Dockerfile,README.md,requirements.txt,start.sh,.gitignore} /tmp/hf-docling-deploy/

# Commit and push
cd /tmp/hf-docling-deploy
git add -A
git commit -m "feat: description of changes"
git push

# Clean up
rm -rf /tmp/hf-docling-deploy

Requires HF CLI auth: huggingface-cli login (logged in as sidoutcome / org outcomelabs).

Settings (configure in HF web UI)

  • Hardware: Nvidia A100 Large 80GB ($2.50/hr) β€” vLLM requires GPU
  • Sleep time: 1 hour (auto-shutdown after 60 min idle)
  • Secrets: API_TOKEN (required for API authentication)

API Endpoints

Endpoint Method Description
/ GET Health check and API info
/parse POST Parse a document (PDF/image) to markdown/JSON
/parse/url POST Parse a document from URL
/docs GET OpenAPI documentation (Swagger UI)

Key Dependencies

  • docling: IBM's document parsing library with TableFormer
  • fastapi: API framework
  • python-multipart: File upload handling
  • uvicorn: ASGI server
  • httpx: HTTP client for URL parsing
  • pydantic: Request/response validation
  • opencv-python-headless: Image preprocessing (denoise, CLAHE)
  • pdf2image: PDF page to image conversion for VLM
  • huggingface-hub: Model/space utilities

Note: vLLM and PyTorch are provided by the base Docker image (vllm/vllm-openai:v0.14.1), not in requirements.txt.

Environment Variables

Variable Description Default
API_TOKEN Required. Secret token for API authentication -
MAX_FILE_SIZE_MB Maximum upload file size in MB 1024
IMAGES_SCALE Image resolution scale for page rendering 2.0
VLM_MODEL VLM model for text recognition pass Qwen/Qwen3-VL-30B-A3B-Instruct
VLM_HOST vLLM server host 127.0.0.1
VLM_PORT vLLM server port 8000
VLM_GPU_MEMORY_UTILIZATION Fraction of GPU memory for vLLM 0.85
VLM_MAX_MODEL_LEN Max sequence length for vLLM 8192

Testing

Bruno collection: Open docs/api-collections/ in Bruno for ready-to-use requests with local/production environments.

Note: Testing requires an A100 GPU with vLLM running. Use the Docker container for testing.

Test with curl

# Test /parse endpoint
curl -X POST "http://localhost:7860/parse" \
  -H "Authorization: Bearer test" \
  -F "file=@sample.pdf" \
  -F "output_format=markdown"

# Test with images
curl -X POST "http://localhost:7860/parse" \
  -H "Authorization: Bearer test" \
  -F "file=@sample.pdf" \
  -F "output_format=markdown" \
  -F "include_images=true"

# Test /parse/url endpoint
curl -X POST "http://localhost:7860/parse/url" \
  -H "Authorization: Bearer test" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://arxiv.org/pdf/2408.09869", "output_format": "markdown"}'

Test with Python

import httpx

API_URL = "http://localhost:7860"
API_TOKEN = "test"

with open("sample.pdf", "rb") as f:
    response = httpx.post(
        f"{API_URL}/parse",
        headers={"Authorization": f"Bearer {API_TOKEN}"},
        files={"file": ("sample.pdf", f, "application/pdf")},
        data={"output_format": "markdown"},
    )
print(response.json())

Logging & Monitoring

The API provides comprehensive logging:

  • Request IDs: Each request gets a unique 8-char ID
  • Startup logs: Device info, GPU name, configuration
  • Request logs: File size, type, processing time, pages/sec
  • Docling output: Conversion progress and timing
  • Error tracking: Full exception details with context

Comparison with MinerU

Feature Docling (Hybrid VLM) MinerU
Maintainer IBM Research + Qwen3-VL OpenDataLab
Table Detection TableFormer (built-in) Multiple backends
OCR Qwen3-VL-30B-A3B via vLLM Built-in
VLM Support Hybrid (Standard + VLM two-pass) Hybrid backend
License MIT AGPL-3.0
GPU Memory ~24GB (vLLM + Docling) ~6-10GB (pipeline)
Primary Use Case Enterprise documents General PDF parsing

Workflow Orchestration, Task Management & Core Principles

See root CLAUDE.md for full Workflow Orchestration (plan mode, subagents, self-improvement, verification, elegance, bug fixing), Task Management, and Core Principles. Files: <workspace-root>/tasks/todo.md, <workspace-root>/tasks/lessons.md.