Spaces:
Running
Getting Started
Docling Studio ships two Docker image variants:
| Variant | Image tag | Size | Description |
|---|---|---|---|
| remote | latest-remote |
~270 MB | Lightweight β delegates to an external Docling Serve instance |
| local | latest-local |
~1.9 GB | Full β runs Docling in-process, CPU-only (downloads ML models on first run) |
Docker β remote mode (fastest)
docker run -p 3000:3000 \
-e DOCLING_SERVE_URL=http://your-docling-serve:5001 \
ghcr.io/scub-france/docling-studio:latest-remote
Docker β local mode (self-contained)
docker run -p 3000:3000 ghcr.io/scub-france/docling-studio:latest-local
Note: The first analysis takes longer as Docling downloads its ML models (~400 MB). Subsequent runs are fast.
Open http://localhost:3000.
Docker Compose (recommended for development)
git clone https://github.com/scub-france/Docling-Studio.git
cd Docling-Studio
# Local mode (default)
docker compose up --build
# Remote mode
CONVERSION_MODE=remote DOCLING_SERVE_URL=http://your-docling-serve:5001 docker compose up --build
Local Development
=== "Backend (Python 3.12+)"
```bash
cd document-parser
python -m venv .venv && source .venv/bin/activate
# Remote mode (lightweight)
pip install -r requirements.txt
# Local mode (with Docling)
pip install -r requirements-local.txt
uvicorn main:app --reload --port 8000
```
=== "Frontend (Node 20+)"
```bash
cd frontend
npm install
npm run dev
```
The frontend runs on http://localhost:3000 and proxies API calls to http://localhost:8000.
Running Tests
=== "Backend"
```bash
cd document-parser
pip install pytest pytest-asyncio httpx
pytest tests/ -v
```
=== "Frontend"
```bash
cd frontend
npm run test:run
```
Pipeline Options
These options map directly to Docling's PdfPipelineOptions.
| Option | Default | Description |
|---|---|---|
do_ocr |
true |
OCR for scanned pages and embedded images |
do_table_structure |
true |
Table detection and row/column reconstruction |
table_mode |
accurate |
accurate (TableFormer) or fast |
do_code_enrichment |
false |
Specialized OCR for code blocks |
do_formula_enrichment |
false |
Math formula recognition (LaTeX output) |
do_picture_classification |
false |
Classify images by type |
do_picture_description |
false |
Generate image descriptions via VLM |
generate_picture_images |
false |
Extract detected images as separate files |
generate_page_images |
false |
Rasterize each page as an image |
images_scale |
1.0 |
Scale factor for generated images (0.1β10) |
Chunking Options
!!! note Chunking is only available in local mode. The chunking UI is hidden when using remote mode (Docling Serve).
After a document is analyzed, you can split the extracted content into semantic chunks. Chunking can be configured at analysis time or re-run later with different options via the rechunk action.
| Option | Default | Description |
|---|---|---|
chunker_type |
hybrid |
hybrid (semantic + structural), hierarchical (heading-based), or page (one chunk per page) |
max_tokens |
512 |
Maximum tokens per chunk |
merge_peers |
true |
Merge sibling elements under the same heading |
repeat_table_header |
true |
Repeat table headers when a table is split across chunks |
Each chunk includes:
- text β the chunk content
- headings β heading hierarchy leading to the chunk
- source_page β the page number the chunk originates from
- token_count β number of tokens in the chunk
- bboxes β bounding boxes of the chunk's source elements (page + coordinates)
Configuration
All configuration is done via environment variables:
| Variable | Default | Description |
|---|---|---|
CONVERSION_ENGINE |
local |
local (in-process Docling) or remote (Docling Serve) |
DOCLING_SERVE_URL |
http://localhost:5001 |
Docling Serve endpoint (remote mode only) |
DOCLING_SERVE_API_KEY |
β | API key for Docling Serve (optional) |
CORS_ORIGINS |
http://localhost:3000,... |
CORS allowed origins |
UPLOAD_DIR |
./uploads |
File storage directory |
DB_PATH |
./data/docling_studio.db |
SQLite database path |
CONVERSION_TIMEOUT |
600 |
Max seconds per Docling conversion |
MAX_CONCURRENT_ANALYSES |
3 |
Maximum parallel analysis jobs |
DEPLOYMENT_MODE |
self-hosted |
self-hosted or huggingface (shows disclaimer banner) |
APP_VERSION |
dev |
Application version (set automatically by CI/Docker) |
System Requirements
| Remote image | Local image | |
|---|---|---|
| Image size | ~270 MB | ~1.9 GB |
| Memory | 2 GB | 6 GB (recommended 8 GB+) |
| CPUs | 2 | 4 (recommended 8+) |
All Docker images are multi-arch (linux/amd64 + linux/arm64). No GPU required.
