docling-studio / docs /architecture.md
Pier-Jean's picture
Upload folder using huggingface_hub
cc59214 verified

Architecture

Overview

Docling Studio architecture{ width="700" }

Two services communicating via REST. The frontend is a Vue 3 SPA served by Nginx in production. The backend is a FastAPI app that wraps Docling's document conversion engine.

Zooming into the backend

The schema above shows the macro view. Inside the backend, the code follows a Clean Architecture with strict layer boundaries:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Backend                           β”‚
β”‚                                                      β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                       β”‚
β”‚   β”‚   api/   β”‚  ← HTTP (FastAPI routes, Pydantic)    β”‚
β”‚   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                                       β”‚
β”‚        β”‚ calls                                       β”‚
β”‚   β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”                                       β”‚
β”‚   β”‚services/ β”‚  ← Use case orchestration             β”‚
β”‚   β””β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”˜                                       β”‚
β”‚      β”‚    β”‚                                          β”‚
β”‚  β”Œβ”€β”€β”€β–Όβ”€β”€β” β”Œβ–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                             β”‚
β”‚  β”‚domainβ”‚ β”‚persistence/ β”‚                             β”‚
β”‚  β”‚      β”‚ β”‚             β”‚                             β”‚
β”‚  β”‚bbox  β”‚ β”‚ SQLite CRUD β”‚  ← Storage (your blue box) β”‚
β”‚  β”‚parse β”‚ β”‚ file store  β”‚                             β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                             β”‚
β”‚  ↑ pure Python, no deps   ↑ aiosqlite               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Dependencies flow inward: api β†’ services β†’ domain. The domain layer has zero knowledge of HTTP or database.

Backend β€” Clean Architecture

The backend follows a strict layered architecture. Dependencies flow inward: API β†’ Services β†’ Domain. The domain layer has zero knowledge of HTTP or database.

document-parser/
β”œβ”€β”€ main.py                   # FastAPI app, CORS, lifespan, health endpoint
β”‚
β”œβ”€β”€ domain/                   # Pure domain β€” no HTTP, no DB
β”‚   β”œβ”€β”€ models.py             # Document, AnalysisJob dataclasses
β”‚   β”œβ”€β”€ ports.py              # Abstract protocols (DocumentConverter, DocumentChunker)
β”‚   β”œβ”€β”€ value_objects.py      # ConversionResult, ChunkingOptions, ChunkResult
β”‚   └── bbox.py               # Bounding box coordinate normalization
β”‚
β”œβ”€β”€ api/                      # HTTP layer (FastAPI routers)
β”‚   β”œβ”€β”€ schemas.py            # Pydantic DTOs (camelCase serialization)
β”‚   β”œβ”€β”€ documents.py          # /api/documents endpoints
β”‚   └── analyses.py           # /api/analyses endpoints (create, rechunk, delete)
β”‚
β”œβ”€β”€ persistence/              # Data layer (SQLite via aiosqlite)
β”‚   β”œβ”€β”€ database.py           # Connection management, schema init
β”‚   β”œβ”€β”€ document_repo.py      # Document CRUD
β”‚   └── analysis_repo.py      # AnalysisJob CRUD
β”‚
β”œβ”€β”€ infra/                    # Infrastructure adapters
β”‚   β”œβ”€β”€ settings.py           # Environment-based configuration
β”‚   β”œβ”€β”€ local_converter.py    # In-process Docling converter (local mode)
β”‚   β”œβ”€β”€ serve_converter.py    # HTTP client for Docling Serve (remote mode)
β”‚   β”œβ”€β”€ local_chunker.py      # In-process chunking (HierarchicalChunker, HybridChunker)
β”‚   β”œβ”€β”€ rate_limiter.py       # Sliding-window rate limiting middleware
β”‚   └── bbox.py               # Bbox coordinate normalization helpers
β”‚
β”œβ”€β”€ services/                 # Use case orchestration
β”‚   β”œβ”€β”€ document_service.py   # Upload, delete, preview
β”‚   └── analysis_service.py   # Async Docling processing + chunking
β”‚
└── tests/                    # pytest (199 tests)

Layer responsibilities

Layer Role Depends on
domain Dataclasses, value objects, abstract ports Nothing (pure Python)
persistence SQLite CRUD, aiosqlite domain (models)
infra Adapters: converters, chunker, rate limiter, settings domain (ports, value objects)
services Orchestrate use cases, call converters/chunkers domain + persistence + infra
api HTTP endpoints, Pydantic DTOs, error handling services

API contract

The API uses camelCase serialization (via Pydantic alias_generator), while the backend uses snake_case internally. The pages_json field contains raw dataclasses.asdict() output, so page data uses snake_case (page_number, not pageNumber).

Frontend β€” Feature-Based

The frontend is organized by feature, each with its own store, API client, and UI components.

frontend/src/
β”œβ”€β”€ app/                      # App shell, router, global styles
β”œβ”€β”€ pages/                    # Route-level pages
β”‚   β”œβ”€β”€ HomePage.vue
β”‚   β”œβ”€β”€ StudioPage.vue        # PDF viewer + config + results
β”‚   β”œβ”€β”€ DocumentsPage.vue
β”‚   β”œβ”€β”€ HistoryPage.vue
β”‚   └── SettingsPage.vue
β”‚
β”œβ”€β”€ features/                 # Feature modules
β”‚   β”œβ”€β”€ analysis/             # Analysis store, API, bbox scaling, UI
β”‚   β”‚   β”œβ”€β”€ store.ts
β”‚   β”‚   β”œβ”€β”€ api.ts
β”‚   β”‚   β”œβ”€β”€ bboxScaling.ts    # Pure math: page coords β†’ pixel coords
β”‚   β”‚   └── ui/
β”‚   β”‚       β”œβ”€β”€ BboxOverlay.vue
β”‚   β”‚       β”œβ”€β”€ AnalysisPanel.vue
β”‚   β”‚       β”œβ”€β”€ StructureViewer.vue
β”‚   β”‚       └── ...
β”‚   β”œβ”€β”€ chunking/             # Chunk panel UI + rechunk action
β”‚   β”œβ”€β”€ document/             # Document store, API, upload
β”‚   β”œβ”€β”€ feature-flags/        # Feature flag store (reads /api/health)
β”‚   β”œβ”€β”€ history/              # History store, navigation
β”‚   └── settings/             # Theme, locale, API URL
β”‚
└── shared/                   # Cross-feature utilities
    β”œβ”€β”€ types.ts              # All shared TypeScript interfaces
    β”œβ”€β”€ i18n.ts               # FR/EN translations
    β”œβ”€β”€ format.ts             # Date/size formatters
    └── api/http.ts           # HTTP client (fetch wrapper)

Data flow

User action β†’ Pinia store action β†’ API client (fetch) β†’ Backend REST endpoint
                                                              β”‚
Backend response β†’ Pinia store state β†’ Vue reactivity β†’ UI update

Key design decisions

  • Pinia stores per feature, not global. Each feature owns its state.
  • TypeScript strict mode with shared interfaces in shared/types.ts.
  • No component library β€” custom CSS with CSS variables for theming.
  • vue-tsc in CI to catch type errors before merge.

Feature Flags

The frontend adapts its UI based on the backend's capabilities. On startup, the feature flag store fetches /api/health and reads the engine and deploymentMode fields.

Flag Condition Effect
chunking engine === 'local' Shows chunking options in the analysis panel
disclaimer deploymentMode === 'huggingface' Shows a disclaimer banner at the top of the app

This allows the same frontend build to work with both local and remote backends without conditional compilation.

Rate Limiting

The backend applies a sliding-window rate limiter as middleware:

  • 60 requests per 60 seconds per client IP
  • The /api/health endpoint is excluded
  • When the limit is exceeded, the API returns 429 Too Many Requests with a Retry-After header

Analysis Lifecycle

An analysis job follows this state machine:

PENDING β†’ RUNNING β†’ COMPLETED
                  β†’ FAILED
Status Description
PENDING Job created, waiting for a processing slot
RUNNING Docling conversion in progress
COMPLETED Conversion finished β€” results available (markdown, HTML, pages, chunks)
FAILED Conversion error β€” error_message contains details

The backend limits parallel jobs via MAX_CONCURRENT_ANALYSES (default: 3) to avoid overloading the CPU during Docling processing.

Local vs Remote Mode

The backend supports two conversion engines, selected via the CONVERSION_ENGINE environment variable:

Local Remote
Engine In-process Docling (PyTorch) HTTP client to Docling Serve
Chunking Available (in-process) Not available
Docker image latest-local (~1.9 GB) latest-remote (~270 MB)
ML models Downloaded on first run (~400 MB) Managed by Docling Serve
CPU/RAM 4+ CPUs, 6+ GB RAM 2 CPUs, 2 GB RAM

The converter is selected at startup in main.py via _build_converter(). The chunker (_build_chunker()) is only instantiated in local mode β€” in remote mode, the chunking feature flag is disabled and the UI hides the chunking panel.

Health Endpoint

GET /api/health returns the backend status:

{
  "status": "ok",
  "engine": "local",
  "version": "0.3.0",
  "deploymentMode": "self-hosted"
}

The frontend uses this response to:

  1. Verify the backend is reachable
  2. Evaluate feature flags (chunking, disclaimer)
  3. Display the app version