Spaces:
Running
Architecture
Overview
Two services communicating via REST. The frontend is a Vue 3 SPA served by Nginx in production. The backend is a FastAPI app that wraps Docling's document conversion engine.
Zooming into the backend
The schema above shows the macro view. Inside the backend, the code follows a Clean Architecture with strict layer boundaries:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend β
β β
β ββββββββββββ β
β β api/ β β HTTP (FastAPI routes, Pydantic) β
β ββββββ¬ββββββ β
β β calls β
β ββββββΌββββββ β
β βservices/ β β Use case orchestration β
β ββββ¬βββββ¬βββ β
β β β β
β βββββΌβββ ββΌββββββββββββ β
β βdomainβ βpersistence/ β β
β β β β β β
β βbbox β β SQLite CRUD β β Storage (your blue box) β
β βparse β β file store β β
β ββββββββ βββββββββββββββ β
β β pure Python, no deps β aiosqlite β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Dependencies flow inward: api β services β domain. The domain layer has zero knowledge of HTTP or database.
Backend β Clean Architecture
The backend follows a strict layered architecture. Dependencies flow inward: API β Services β Domain. The domain layer has zero knowledge of HTTP or database.
document-parser/
βββ main.py # FastAPI app, CORS, lifespan, health endpoint
β
βββ domain/ # Pure domain β no HTTP, no DB
β βββ models.py # Document, AnalysisJob dataclasses
β βββ ports.py # Abstract protocols (DocumentConverter, DocumentChunker)
β βββ value_objects.py # ConversionResult, ChunkingOptions, ChunkResult
β βββ bbox.py # Bounding box coordinate normalization
β
βββ api/ # HTTP layer (FastAPI routers)
β βββ schemas.py # Pydantic DTOs (camelCase serialization)
β βββ documents.py # /api/documents endpoints
β βββ analyses.py # /api/analyses endpoints (create, rechunk, delete)
β
βββ persistence/ # Data layer (SQLite via aiosqlite)
β βββ database.py # Connection management, schema init
β βββ document_repo.py # Document CRUD
β βββ analysis_repo.py # AnalysisJob CRUD
β
βββ infra/ # Infrastructure adapters
β βββ settings.py # Environment-based configuration
β βββ local_converter.py # In-process Docling converter (local mode)
β βββ serve_converter.py # HTTP client for Docling Serve (remote mode)
β βββ local_chunker.py # In-process chunking (HierarchicalChunker, HybridChunker)
β βββ rate_limiter.py # Sliding-window rate limiting middleware
β βββ bbox.py # Bbox coordinate normalization helpers
β
βββ services/ # Use case orchestration
β βββ document_service.py # Upload, delete, preview
β βββ analysis_service.py # Async Docling processing + chunking
β
βββ tests/ # pytest (199 tests)
Layer responsibilities
| Layer | Role | Depends on |
|---|---|---|
| domain | Dataclasses, value objects, abstract ports | Nothing (pure Python) |
| persistence | SQLite CRUD, aiosqlite | domain (models) |
| infra | Adapters: converters, chunker, rate limiter, settings | domain (ports, value objects) |
| services | Orchestrate use cases, call converters/chunkers | domain + persistence + infra |
| api | HTTP endpoints, Pydantic DTOs, error handling | services |
API contract
The API uses camelCase serialization (via Pydantic alias_generator), while the backend uses snake_case internally. The pages_json field contains raw dataclasses.asdict() output, so page data uses snake_case (page_number, not pageNumber).
Frontend β Feature-Based
The frontend is organized by feature, each with its own store, API client, and UI components.
frontend/src/
βββ app/ # App shell, router, global styles
βββ pages/ # Route-level pages
β βββ HomePage.vue
β βββ StudioPage.vue # PDF viewer + config + results
β βββ DocumentsPage.vue
β βββ HistoryPage.vue
β βββ SettingsPage.vue
β
βββ features/ # Feature modules
β βββ analysis/ # Analysis store, API, bbox scaling, UI
β β βββ store.ts
β β βββ api.ts
β β βββ bboxScaling.ts # Pure math: page coords β pixel coords
β β βββ ui/
β β βββ BboxOverlay.vue
β β βββ AnalysisPanel.vue
β β βββ StructureViewer.vue
β β βββ ...
β βββ chunking/ # Chunk panel UI + rechunk action
β βββ document/ # Document store, API, upload
β βββ feature-flags/ # Feature flag store (reads /api/health)
β βββ history/ # History store, navigation
β βββ settings/ # Theme, locale, API URL
β
βββ shared/ # Cross-feature utilities
βββ types.ts # All shared TypeScript interfaces
βββ i18n.ts # FR/EN translations
βββ format.ts # Date/size formatters
βββ api/http.ts # HTTP client (fetch wrapper)
Data flow
User action β Pinia store action β API client (fetch) β Backend REST endpoint
β
Backend response β Pinia store state β Vue reactivity β UI update
Key design decisions
- Pinia stores per feature, not global. Each feature owns its state.
- TypeScript strict mode with shared interfaces in
shared/types.ts. - No component library β custom CSS with CSS variables for theming.
- vue-tsc in CI to catch type errors before merge.
Feature Flags
The frontend adapts its UI based on the backend's capabilities. On startup, the feature flag store fetches /api/health and reads the engine and deploymentMode fields.
| Flag | Condition | Effect |
|---|---|---|
chunking |
engine === 'local' |
Shows chunking options in the analysis panel |
disclaimer |
deploymentMode === 'huggingface' |
Shows a disclaimer banner at the top of the app |
This allows the same frontend build to work with both local and remote backends without conditional compilation.
Rate Limiting
The backend applies a sliding-window rate limiter as middleware:
- 60 requests per 60 seconds per client IP
- The
/api/healthendpoint is excluded - When the limit is exceeded, the API returns
429 Too Many Requestswith aRetry-Afterheader
Analysis Lifecycle
An analysis job follows this state machine:
PENDING β RUNNING β COMPLETED
β FAILED
| Status | Description |
|---|---|
PENDING |
Job created, waiting for a processing slot |
RUNNING |
Docling conversion in progress |
COMPLETED |
Conversion finished β results available (markdown, HTML, pages, chunks) |
FAILED |
Conversion error β error_message contains details |
The backend limits parallel jobs via MAX_CONCURRENT_ANALYSES (default: 3) to avoid overloading the CPU during Docling processing.
Local vs Remote Mode
The backend supports two conversion engines, selected via the CONVERSION_ENGINE environment variable:
| Local | Remote | |
|---|---|---|
| Engine | In-process Docling (PyTorch) | HTTP client to Docling Serve |
| Chunking | Available (in-process) | Not available |
| Docker image | latest-local (~1.9 GB) |
latest-remote (~270 MB) |
| ML models | Downloaded on first run (~400 MB) | Managed by Docling Serve |
| CPU/RAM | 4+ CPUs, 6+ GB RAM | 2 CPUs, 2 GB RAM |
The converter is selected at startup in main.py via _build_converter(). The chunker (_build_chunker()) is only instantiated in local mode β in remote mode, the chunking feature flag is disabled and the UI hides the chunking panel.
Health Endpoint
GET /api/health returns the backend status:
{
"status": "ok",
"engine": "local",
"version": "0.3.0",
"deploymentMode": "self-hosted"
}
The frontend uses this response to:
- Verify the backend is reachable
- Evaluate feature flags (chunking, disclaimer)
- Display the app version
