XQ commited on
Commit
04082c4
·
1 Parent(s): bb91c88

Update README

Browse files
Files changed (2) hide show
  1. .github/README.md +244 -67
  2. README.md +245 -58
.github/README.md CHANGED
@@ -1,88 +1,263 @@
1
- # Doc Assistant
2
 
3
- ## Live Demo
 
4
 
5
- [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) hosted on Hugging Face Spaces
6
 
7
- A document intelligence system covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a Plan-and-Execute agent (default) with conversation memory for complex multi-step queries, and a pipeline for lightweight models without tool-calling support. Retrieval quality is evaluated with RAGAS.
8
 
9
- ## How it works
10
 
11
- PDFs are parsed with PyMuPDF, split into chunks (fixed-size, recursive, or semantic), embedded with a multilingual sentence-transformer, and stored in Qdrant. A BM25 index is built from the same chunks for keyword search.
12
 
13
- At query time both indexes are searched and their results merged with reciprocal rank fusion. A cross-encoder then rescores the candidates before the top chunks are passed to the LLM. The API streams the response over SSE and the Streamlit UI displays it with source attribution.
 
 
 
 
 
 
 
 
 
14
 
15
- **Two routing modes, switchable via `AGENT_MODE`:**
16
 
17
- - **Plan-and-Execute Agent** (default): a structured multi-step pipeline a planner decomposes the query into steps, an executor runs each step via a ReAct sub-agent with tool access, and a synthesizer produces the final cited answer. Includes conversation memory for multi-turn follow-ups. Requires a model with tool-calling support.
18
 
19
- | Tool | Purpose |
20
- |------|---------|
21
- | `hybrid_search(query, top_k)` | Retrieve relevant passages via hybrid search + reranking |
22
- | `multi_query_search(question, top_k)` | Decompose complex questions into sub-queries, search each, merge results |
23
- | `search_within_document(document_id, query, top_k)` | Find specific sections inside a known document |
24
- | `summarize_document(document_id)` | Generate a structured summary of a document |
25
- | `list_documents()` | See what's in the knowledge base |
26
- | `fetch_document(document_id)` | Read a full document |
27
 
28
- - **Pipeline** (`AGENT_MODE=pipeline`): a predefined LangGraph graph — language detection → optional translation → hybrid retrieval → reranking → generation, with a confidence-based retry loop. Works with lightweight local models that lack tool-calling support.
29
 
30
- ## Tech Stack
31
 
32
- | Category | Technology |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  |---|---|
34
  | Framework | FastAPI, uvicorn |
35
- | Orchestration | LangChain, LangGraph |
36
- | Vector Store | Qdrant (local mode) |
37
  | Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
38
- | LLM | `gemma4` via Ollama (default) |
39
- | Sparse Search | rank_bm25 |
40
  | Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
41
- | PDF Parsing | PyMuPDF |
42
- | Evaluation | RAGAS |
43
- | UI | Streamlit |
44
 
45
- ## Provider Support
46
 
47
- LLM and embedding backends are configured through environment variables. Supported providers: Ollama, OpenAI, Azure OpenAI, Anthropic, Google GenAI, Groq. The default (Ollama + HuggingFace) runs locally without any API keys.
48
 
49
- See `.env.example` for per-provider configuration.
50
 
51
- ## Agent Mode
52
 
53
- | Mode | `AGENT_MODE` | Notes |
54
- |------|-------------|-------|
55
- | Plan-and-Execute | `react` (default) | Structured multi-step agent with conversation memory |
56
- | Pipeline | `pipeline` | Predefined graph, works with lightweight models that lack tool calling |
57
 
58
- Tool-calling is supported by OpenAI, Anthropic, Google GenAI, Azure OpenAI, Groq, and some Ollama models (`gemma4`, `llama3.1`, `qwen2.5`, `mistral-nemo`).
59
 
60
- Plan-and-Execute with local Ollama (default):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
- ```dotenv
63
- AGENT_MODE=react
64
- LLM_PROVIDER=ollama
65
- OLLAMA_MODEL=gemma4:e4b
66
  ```
67
 
68
- Pipeline with a lightweight model:
69
 
70
- ```dotenv
71
- AGENT_MODE=pipeline
72
- LLM_PROVIDER=ollama
73
- OLLAMA_MODEL=gemma3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ```
75
 
76
- Plan-and-Execute with OpenAI:
77
 
78
- ```dotenv
79
- AGENT_MODE=react
80
- LLM_PROVIDER=openai
81
- OPENAI_API_KEY=sk-...
82
- OPENAI_MODEL=gpt-4o-mini
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  ```
84
 
85
- ## Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
 
87
  Requires Python 3.11+ and [Ollama](https://ollama.com/).
88
 
@@ -96,15 +271,15 @@ cp .env.example .env
96
  ollama pull gemma4:e4b
97
  python -m scripts.ingest # place PDFs in docs/ first
98
 
99
- uvicorn src.api.main:app --reload # http://localhost:8000
100
- streamlit run src/ui/app.py # http://localhost:8501
101
  ```
102
 
103
- ## Docker
104
 
105
- Docker Compose handles Qdrant, the API, and the Streamlit UI together. The API container waits for Qdrant on startup and runs ingestion automatically if the collection is empty.
106
 
107
- **Local (Ollama + HuggingFace):**
108
 
109
  ```bash
110
  cp .env.example .env
@@ -118,26 +293,28 @@ docker compose --profile local up --build
118
  | Streamlit UI | http://localhost:8501 |
119
  | Qdrant dashboard | http://localhost:6333/dashboard |
120
 
121
- **Cloud (OpenAI / Anthropic / etc.):**
122
 
123
  ```bash
124
  cp .env.example .env
125
- # set LLM_PROVIDER, EMBEDDING_PROVIDER, and your API key
126
  docker compose up --build
127
  ```
128
 
129
- **Hugging Face Spaces:** a `Dockerfile` and supervisor config are included. The Space runs Qdrant, the API, and the UI behind nginx on port 7860.
 
 
130
 
131
- ## Project Structure
132
 
133
  ```
134
  src/
135
  config.py # env-based configuration
136
- provider.py # create_llm() / create_embeddings() factory
137
  models.py # shared dataclasses
138
  ingestion/
139
  pdf_parser.py # PyMuPDF extraction
140
- text_cleaner.py # Danish/English normalization
141
  chunker.py # fixed-size, recursive, semantic chunking
142
  pipeline.py # ingestion orchestration
143
  retrieval/
@@ -148,11 +325,11 @@ src/
148
  reranker.py # cross-encoder
149
  api/
150
  main.py
151
- routes.py # /query, /query/stream, /ingest, /health
152
  agent/
153
  intent_classifier.py
154
  router.py # pipeline mode (AGENT_MODE=pipeline)
155
- tools.py # 6 retrieval tools + ToolResultStore
156
  plan_and_execute.py # Plan-and-Execute agent (AGENT_MODE=react)
157
  memory.py # conversation memory for multi-turn
158
  evaluation/
@@ -163,5 +340,5 @@ scripts/
163
  ingest.py
164
  e2e_test.py
165
  tests/
166
- docs/ # example PDFs/texts (KU AI public documents)
167
  ```
 
1
+ # Dokumentassistent
2
 
3
+ ## Live demo
4
+ Hosted on Hugging Face Spaces: [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space)
5
 
6
+ [Skip to English ↓](#english)
7
 
8
+ ## Dansk
9
 
10
+ En produktionsklar RAG-applikation, der gør det muligt at stille spørgsmål til dokumenter på dansk og få svar med kildehenvisninger. Systemet er bygget på open source-komponenter (LangChain, LangGraph, Qdrant, Ollama) og kan køre helt lokalt uden eksterne API-kald. Det implementerer hybrid søgning med reranking, en Plan-and-Execute agent med samtalehukommelse, og RAGAS-baseret evaluering af svarkvaliteten.
11
 
12
+ ### Funktioner
13
 
14
+ | Område | Implementering |
15
+ |---|---|
16
+ | Ustruktureret data | PyMuPDF-parser, dansk og engelsk tekstrensning, tre opdelingsstrategier (fast størrelse, rekursiv, semantisk) |
17
+ | Hybrid søgning | Qdrant til vektorsøgning kombineret med BM25, flettet med reciprocal rank fusion |
18
+ | Reranking | Cross-encoder `mmarco-mMiniLMv2-L12-H384-v1` |
19
+ | Agent-flows | Plan-and-Execute med seks værktøjer, ReAct-subagent og samtalehukommelse |
20
+ | Evaluering | RAGAS-metrikker (faithfulness, answer relevancy, context precision) |
21
+ | Sporbarhed | Hvert svar har kildehenvisninger med chunk-ID og sidenummer, samt struktureret logning |
22
+ | Provider-abstraktion | Factory-mønster, der gør det muligt at skifte mellem Ollama, OpenAI, Azure OpenAI, Anthropic og Google GenAI uden at ændre forretningskoden |
23
+ | Deployment | Docker Compose til lokal kørsel, Hugging Face Spaces til den offentlige demo |
24
 
25
+ ### Sådan fungerer det
26
 
27
+ PDF-filer bliver læst med PyMuPDF, renset, opdelt i tekststykker (fast størrelse, rekursivt eller semantisk), indlejret med en flersproget sentence-transformer og gemt i Qdrant. Et BM25-indeks bygges de samme tekststykker til nøgleordssøgning.
28
 
29
+ Når en bruger stiller et spørgsmål, kører systemet både den semantiske og den leksikale søgning, fletter resultaterne sammen med reciprocal rank fusion, og lader en cross-encoder rescore kandidaterne. De øverste tekststykker bliver sendt til en LLM, og svaret streames tilbage via SSE og vises i Streamlit-grænsefladen sammen med kilderne.
 
 
 
 
 
 
 
30
 
31
+ ### To agent-tilstande
32
 
33
+ Systemet kan køre i to forskellige tilstande, der vælges via miljøvariablen `AGENT_MODE`.
34
 
35
+ **Pipeline** (`AGENT_MODE=pipeline`) bygger en fast LangGraph-graf med sprogdetektion, valgfri oversættelse, hybrid søgning, reranking og generering. Tilstanden har en confidence-baseret retry-loop og fungerer fint med lette lokale modeller.
36
+
37
+ **Plan-and-Execute Agent** (`AGENT_MODE=react`, standard) er en flertrinsagent, hvor en planner først nedbryder spørgsmålet i delopgaver, en executor kører hver delopgave gennem en ReAct-subagent med adgang til et sæt værktøjer, og en synthesizer producerer det endelige svar med kildehenvisninger. Tilstanden indeholder samtalehukommelse til opfølgende spørgsmål og kræver en model, der understøtter tool calling.
38
+
39
+ | Værktøj | Formål |
40
+ |---|---|
41
+ | `hybrid_search(query, top_k)` | Henter relevante tekststykker via hybrid søgning og reranking |
42
+ | `multi_query_search(question, top_k)` | Nedbryder komplekse spørgsmål i delspørgsmål, søger på hver og fletter resultaterne |
43
+ | `search_within_document(document_id, query, top_k)` | Finder bestemte afsnit i et kendt dokument |
44
+ | `summarize_document(document_id)` | Laver et struktureret resumé af et dokument |
45
+ | `list_documents()` | Viser hvilke dokumenter, der ligger i vidensbasen |
46
+ | `fetch_document(document_id)` | Henter et helt dokument |
47
+
48
+ ### Produktionshensyn
49
+
50
+ - **Sporbarhed.** Hvert genereret svar har kildehenvisninger på chunk-niveau med dokument-ID, sidenummer og tekststykke, så det kan revideres bagudrettet.
51
+ - **Governance.** RAGAS-evalueringspipelinen i `src/evaluation/` gør det muligt at måle faithfulness og context precision, før ændringer slippes løs i produktion.
52
+ - **Konfigurerbarhed.** Ingen hardkodede stier, modelnavne eller API-nøgler. Alt styres via miljøvariabler gennem `src/config.py`.
53
+ - **Provider-neutralitet.** Forretningskoden importerer aldrig en provider-SDK direkte. LLM- og embedding-backends skiftes via factory-funktionerne `create_llm()` og `create_embeddings()`, hvilket undgår vendor lock-in.
54
+ - **Lokal som standard.** Standardkonfigurationen kører helt uden eksterne API-kald og passer til miljøer med strenge krav til datahjemsted.
55
+ - **Pakket i containere.** Docker Compose til lokal kørsel og Hugging Face Spaces til den offentlige demo.
56
+
57
+ ### Teknologivalg
58
+
59
+ | Kategori | Teknologi |
60
  |---|---|
61
  | Framework | FastAPI, uvicorn |
62
+ | Orkestrering | LangChain, LangGraph |
63
+ | Vektorlager | Qdrant (lokal tilstand) |
64
  | Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
65
+ | LLM | `gemma4:e4b` via Ollama som standard |
66
+ | Sparse-søgning | rank_bm25 |
67
  | Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
68
+ | PDF-parsing | PyMuPDF |
69
+ | Evaluering | RAGAS |
70
+ | Grænseflade | Streamlit |
71
 
72
+ ### Provider-understøttelse
73
 
74
+ LLM- og embedding-backends konfigureres via miljøvariabler. De understøttede providers er Ollama, OpenAI, Azure OpenAI, Anthropic, Google GenAI og Groq. Standardopsætningen (Ollama og HuggingFace) kører helt lokalt uden API-nøgler.
75
 
76
+ Se `.env.example` for konfiguration pr. provider.
77
 
78
+ ### Prøv den live
79
 
80
+ Demoen ligger [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space).
 
 
 
81
 
82
+ Prøv for eksempel disse spørgsmål dansk.
83
 
84
+ - "Hvad er KU's politik for brug af AI-værktøjer?"
85
+ - "Hvilke regler gælder for brug af generativ AI i eksamen?"
86
+ - "Sammenlign reglerne for AI-brug i forskning og undervisning."
87
+
88
+ Det sidste spørgsmål udløser Plan-and-Execute-agenten, så man kan se den nedbryde spørgsmålet i delopgaver i realtid.
89
+
90
+ ### Kom i gang
91
+
92
+ Kræver Python 3.11+ og [Ollama](https://ollama.com/).
93
+
94
+ ```bash
95
+ git clone https://github.com/Xiiqiing/Dokumentassistent.git
96
+ cd Dokumentassistent
97
+ python -m venv .venv && source .venv/bin/activate
98
+ pip install -r requirements.txt
99
+ cp .env.example .env
100
+
101
+ ollama pull gemma4:e4b
102
+ python -m scripts.ingest # læg PDF-filer i docs/ først
103
 
104
+ uvicorn src.api.main:app --reload # http://localhost:8000
105
+ streamlit run src/ui/app.py # http://localhost:8501
 
 
106
  ```
107
 
108
+ ### Docker
109
 
110
+ Docker Compose håndterer Qdrant, API'et og Streamlit-grænsefladen samlet. API-containeren venter på, at Qdrant er oppe, og kører ingestion automatisk, hvis samlingen er tom.
111
+
112
+ #### Lokalt setup med Ollama og HuggingFace
113
+
114
+ ```bash
115
+ cp .env.example .env
116
+ docker compose --profile local up --build
117
+ ```
118
+
119
+ | Service | URL |
120
+ |---|---|
121
+ | API | http://localhost:8000 |
122
+ | API-dokumentation | http://localhost:8000/docs |
123
+ | Streamlit-grænseflade | http://localhost:8501 |
124
+ | Qdrant-dashboard | http://localhost:6333/dashboard |
125
+
126
+ #### Cloud-setup med OpenAI, Anthropic eller andre
127
+
128
+ ```bash
129
+ cp .env.example .env
130
+ # sæt LLM_PROVIDER, EMBEDDING_PROVIDER og din API-nøgle
131
+ docker compose up --build
132
  ```
133
 
134
+ #### Hugging Face Spaces
135
 
136
+ Et `Dockerfile` og en supervisor-konfiguration er inkluderet. Spacet kører Qdrant, API'et og grænsefladen bag nginx på port 7860.
137
+
138
+ ### Projektstruktur
139
+
140
+ ```
141
+ src/
142
+ config.py # konfiguration via miljøvariabler
143
+ provider.py # create_llm() og create_embeddings() factory
144
+ models.py # delte dataklasser
145
+ ingestion/
146
+ pdf_parser.py # PyMuPDF-udtræk
147
+ text_cleaner.py # dansk og engelsk normalisering
148
+ chunker.py # fast størrelse, rekursiv og semantisk opdeling
149
+ pipeline.py # ingestion-orkestrering
150
+ retrieval/
151
+ embedder.py
152
+ vector_store.py # Qdrant
153
+ bm25_search.py
154
+ hybrid.py # reciprocal rank fusion
155
+ reranker.py # cross-encoder
156
+ api/
157
+ main.py
158
+ routes.py # /query, /ingest, /health
159
+ agent/
160
+ intent_classifier.py
161
+ router.py # pipeline-tilstand (AGENT_MODE=pipeline)
162
+ tools.py # seks retrieval-værktøjer og ToolResultStore
163
+ plan_and_execute.py # Plan-and-Execute-agent (AGENT_MODE=react)
164
+ memory.py # samtalehukommelse til flere spørgsmål
165
+ evaluation/
166
+ evaluator.py # RAGAS-metrikker
167
+ ui/
168
+ app.py # Streamlit-frontend
169
+ scripts/
170
+ ingest.py
171
+ e2e_test.py
172
+ tests/
173
+ docs/ # eksempel-PDF'er eller tekster (KU AI-dokumenter)
174
  ```
175
 
176
+ ---
177
+
178
+ ## English
179
+
180
+ A production-ready RAG application that lets users ask questions about documents in Danish and receive answers with source citations. The system is built on open source components (LangChain, LangGraph, Qdrant, Ollama) and can run fully local without any external API calls. It implements hybrid search with reranking, a Plan-and-Execute agent with conversation memory, and RAGAS-based evaluation of answer quality.
181
+
182
+ ### Capabilities
183
+
184
+ | Area | Implementation |
185
+ |---|---|
186
+ | Unstructured data | PyMuPDF parser, Danish and English text cleaning, three chunking strategies (fixed-size, recursive, semantic) |
187
+ | Hybrid retrieval | Qdrant dense vectors combined with BM25, fused via reciprocal rank fusion |
188
+ | Reranking | Cross-encoder `mmarco-mMiniLMv2-L12-H384-v1` |
189
+ | Agent flows | Plan-and-Execute with six tools, ReAct sub-agent and conversation memory |
190
+ | Evaluation | RAGAS metrics (faithfulness, answer relevancy, context precision) |
191
+ | Traceability | Each answer carries source references with chunk ID and page number, plus structured logging |
192
+ | Provider abstraction | Factory pattern that allows swapping between Ollama, OpenAI, Azure OpenAI, Anthropic and Google GenAI without touching business code |
193
+ | Deployment | Docker Compose for local setup, Hugging Face Spaces for the public demo |
194
+
195
+ ### How it works
196
+
197
+ PDFs are parsed with PyMuPDF, cleaned, split into chunks (fixed-size, recursive, or semantic), embedded with a multilingual sentence-transformer, and stored in Qdrant. A BM25 index is built from the same chunks for keyword search.
198
+
199
+ At query time, both indexes are searched and the results merged with reciprocal rank fusion. A cross-encoder then rescores the candidates before the top chunks are passed to the LLM. The API streams the response over SSE and the Streamlit UI displays it together with the sources.
200
+
201
+ ### Two agent modes
202
+
203
+ The system can run in two different modes, switchable via the `AGENT_MODE` environment variable.
204
+
205
+ **Pipeline** (`AGENT_MODE=pipeline`) is built on a fixed LangGraph graph with language detection, optional translation, hybrid retrieval, reranking, and generation. The mode has a confidence-based retry loop and works well with lightweight local models.
206
+
207
+ **Plan-and-Execute Agent** (`AGENT_MODE=react`, default) is a multi-step agent where a planner first decomposes the query into sub-tasks, an executor runs each sub-task through a ReAct sub-agent with access to a set of tools, and a synthesizer produces the final answer with citations. The mode includes conversation memory for follow-up questions and requires a model that supports tool calling.
208
+
209
+ | Tool | Purpose |
210
+ |---|---|
211
+ | `hybrid_search(query, top_k)` | Retrieves relevant passages via hybrid search and reranking |
212
+ | `multi_query_search(question, top_k)` | Decomposes complex questions into sub-queries, searches each, and merges the results |
213
+ | `search_within_document(document_id, query, top_k)` | Finds specific sections inside a known document |
214
+ | `summarize_document(document_id)` | Generates a structured summary of a document |
215
+ | `list_documents()` | Shows what is in the knowledge base |
216
+ | `fetch_document(document_id)` | Reads a full document |
217
+
218
+ ### Production considerations
219
+
220
+ - **Traceability.** Every generated answer carries chunk-level source references with document ID, page number and span, so it can be audited and reviewed afterwards.
221
+ - **Governance.** The RAGAS evaluation pipeline in `src/evaluation/` lets you measure faithfulness and context precision before promoting changes to production.
222
+ - **Configurability.** No hardcoded paths, model names or API keys. Everything is controlled via environment variables through `src/config.py`.
223
+ - **Provider neutrality.** Business code never imports a provider SDK directly. LLM and embedding backends swap via the `create_llm()` and `create_embeddings()` factory functions, which avoids vendor lock-in.
224
+ - **Local-first.** The default configuration runs entirely without external API calls and fits environments with strict data residency requirements.
225
+ - **Containerized.** Docker Compose for local runs and Hugging Face Spaces for the public demo.
226
+
227
+ ### Tech stack
228
+
229
+ | Category | Technology |
230
+ |---|---|
231
+ | Framework | FastAPI, uvicorn |
232
+ | Orchestration | LangChain, LangGraph |
233
+ | Vector store | Qdrant (local mode) |
234
+ | Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
235
+ | LLM | `gemma4:e4b` via Ollama (default) |
236
+ | Sparse search | rank_bm25 |
237
+ | Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
238
+ | PDF parsing | PyMuPDF |
239
+ | Evaluation | RAGAS |
240
+ | UI | Streamlit |
241
+
242
+ ### Provider support
243
+
244
+ LLM and embedding backends are configured through environment variables. Supported providers are Ollama, OpenAI, Azure OpenAI, Anthropic, Google GenAI and Groq. The default setup (Ollama and HuggingFace) runs entirely locally without any API keys.
245
+
246
+ See `.env.example` for per-provider configuration.
247
+
248
+ ### Try it live
249
+
250
+ The demo lives at [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space).
251
+
252
+ Try asking these questions in Danish.
253
+
254
+ - "Hvad er KU's politik for brug af AI-værktøjer?"
255
+ - "Hvilke regler gælder for brug af generativ AI i eksamen?"
256
+ - "Sammenlign reglerne for AI-brug i forskning og undervisning."
257
+
258
+ The third question triggers the Plan-and-Execute agent, so you can watch it decompose the query into sub-tasks in real time.
259
+
260
+ ### Quick start
261
 
262
  Requires Python 3.11+ and [Ollama](https://ollama.com/).
263
 
 
271
  ollama pull gemma4:e4b
272
  python -m scripts.ingest # place PDFs in docs/ first
273
 
274
+ uvicorn src.api.main:app --reload # http://localhost:8000
275
+ streamlit run src/ui/app.py # http://localhost:8501
276
  ```
277
 
278
+ ### Docker
279
 
280
+ Docker Compose handles Qdrant, the API and the Streamlit UI together. The API container waits for Qdrant on startup and runs ingestion automatically if the collection is empty.
281
 
282
+ #### Local setup with Ollama and HuggingFace
283
 
284
  ```bash
285
  cp .env.example .env
 
293
  | Streamlit UI | http://localhost:8501 |
294
  | Qdrant dashboard | http://localhost:6333/dashboard |
295
 
296
+ #### Cloud setup with OpenAI, Anthropic or others
297
 
298
  ```bash
299
  cp .env.example .env
300
+ # set LLM_PROVIDER, EMBEDDING_PROVIDER and your API key
301
  docker compose up --build
302
  ```
303
 
304
+ #### Hugging Face Spaces
305
+
306
+ A `Dockerfile` and supervisor configuration are included. The Space runs Qdrant, the API and the UI behind nginx on port 7860.
307
 
308
+ ### Project structure
309
 
310
  ```
311
  src/
312
  config.py # env-based configuration
313
+ provider.py # create_llm() and create_embeddings() factory
314
  models.py # shared dataclasses
315
  ingestion/
316
  pdf_parser.py # PyMuPDF extraction
317
+ text_cleaner.py # Danish and English normalization
318
  chunker.py # fixed-size, recursive, semantic chunking
319
  pipeline.py # ingestion orchestration
320
  retrieval/
 
325
  reranker.py # cross-encoder
326
  api/
327
  main.py
328
+ routes.py # /query, /ingest, /health
329
  agent/
330
  intent_classifier.py
331
  router.py # pipeline mode (AGENT_MODE=pipeline)
332
+ tools.py # six retrieval tools and ToolResultStore
333
  plan_and_execute.py # Plan-and-Execute agent (AGENT_MODE=react)
334
  memory.py # conversation memory for multi-turn
335
  evaluation/
 
340
  ingest.py
341
  e2e_test.py
342
  tests/
343
+ docs/ # example PDFs or texts (KU AI public documents)
344
  ```
README.md CHANGED
@@ -8,81 +8,266 @@ app_port: 7860
8
  noindex: true
9
  ---
10
 
11
- # Doc Assistant
12
 
13
- **Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
 
14
 
15
- A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a pipeline for lightweight models, and a Plan-and-Execute agent flow with conversation memory for complex multi-step queries. Retrieval quality is evaluated with RAGAS.
16
 
17
- ## How it works
18
 
19
- PDFs are parsed with PyMuPDF, split into chunks (fixed-size, recursive, or semantic), embedded with a multilingual sentence-transformer, and stored in Qdrant. A BM25 index is built from the same chunks for keyword search.
20
 
21
- At query time both indexes are searched and their results merged with reciprocal rank fusion. A cross-encoder then rescores the candidates before the top chunks are passed to the LLM. The API streams the response over SSE and the Streamlit UI displays it with source attribution.
22
 
23
- **Two routing modes, switchable via `AGENT_MODE`:**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
- - **Pipeline**: a predefined LangGraph graph — language detection → optional translation → hybrid retrieval → reranking → generation, with a confidence-based retry loop. Works with lightweight local models.
26
 
27
- - **Plan-and-Execute Agent** (default, `AGENT_MODE=react`): a structured multi-step pipeline where a planner decomposes the query into steps, an executor runs each step via a ReAct sub-agent with tool access, and a synthesizer produces the final cited answer. Includes conversation memory for multi-turn follow-ups. Requires a model with tool-calling support.
 
 
 
 
28
 
29
- | Tool | Purpose |
30
- |------|---------|
31
- | `hybrid_search(query, top_k)` | Retrieve relevant passages via hybrid search + reranking |
32
- | `multi_query_search(question, top_k)` | Decompose complex questions into sub-queries, search each, merge results |
33
- | `search_within_document(document_id, query, top_k)` | Find specific sections inside a known document |
34
- | `summarize_document(document_id)` | Generate a structured summary of a document |
35
- | `list_documents()` | See what's in the knowledge base |
36
- | `fetch_document(document_id)` | Read a full document |
37
 
38
- ## Tech Stack
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  | Category | Technology |
41
  |---|---|
42
  | Framework | FastAPI, uvicorn |
43
  | Orchestration | LangChain, LangGraph |
44
- | Vector Store | Qdrant (local mode) |
45
  | Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
46
- | LLM | `gemma4` via Ollama (default) |
47
- | Sparse Search | rank_bm25 |
48
  | Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
49
- | PDF Parsing | PyMuPDF |
50
  | Evaluation | RAGAS |
51
  | UI | Streamlit |
52
 
53
- ## Provider Support
54
 
55
- LLM and embedding backends are configured through environment variables. Supported providers: Ollama, OpenAI, Azure OpenAI, Anthropic, Google GenAI, Groq. The default (Ollama + HuggingFace) runs locally without any API keys.
56
 
57
  See `.env.example` for per-provider configuration.
58
 
59
- ## Agent Mode
60
-
61
- | Mode | `AGENT_MODE` | Notes |
62
- |------|-------------|-------|
63
- | Pipeline | `pipeline` | Predefined graph, works with lightweight models |
64
- | Plan-and-Execute (default) | `react` | Structured multi-step agent with conversation memory |
65
-
66
- Tool-calling is supported by OpenAI, Anthropic, Google GenAI, Azure OpenAI, Groq, and some Ollama models (`llama3.1`, `qwen2.5`, `mistral-nemo`).
67
 
68
- Plan-and-Execute with OpenAI:
69
 
70
- ```dotenv
71
- AGENT_MODE=react
72
- LLM_PROVIDER=openai
73
- OPENAI_API_KEY=sk-...
74
- OPENAI_MODEL=gpt-4o-mini
75
- ```
76
 
77
- Pipeline with local Ollama:
 
 
78
 
79
- ```dotenv
80
- AGENT_MODE=pipeline
81
- LLM_PROVIDER=ollama
82
- OLLAMA_MODEL=gemma3
83
- ```
84
 
85
- ## Quick Start
86
 
87
  Requires Python 3.11+ and [Ollama](https://ollama.com/).
88
 
@@ -96,15 +281,15 @@ cp .env.example .env
96
  ollama pull gemma4:e4b
97
  python -m scripts.ingest # place PDFs in docs/ first
98
 
99
- uvicorn src.api.main:app --reload # http://localhost:8000
100
- streamlit run src/ui/app.py # http://localhost:8501
101
  ```
102
 
103
- ## Docker
104
 
105
- Docker Compose handles Qdrant, the API, and the Streamlit UI together. The API container waits for Qdrant on startup and runs ingestion automatically if the collection is empty.
106
 
107
- **Local (Ollama + HuggingFace):**
108
 
109
  ```bash
110
  cp .env.example .env
@@ -118,26 +303,28 @@ docker compose --profile local up --build
118
  | Streamlit UI | http://localhost:8501 |
119
  | Qdrant dashboard | http://localhost:6333/dashboard |
120
 
121
- **Cloud (OpenAI / Anthropic / etc.):**
122
 
123
  ```bash
124
  cp .env.example .env
125
- # set LLM_PROVIDER, EMBEDDING_PROVIDER, and your API key
126
  docker compose up --build
127
  ```
128
 
129
- **Hugging Face Spaces:** a `Dockerfile` and supervisor config are included. The Space runs Qdrant, the API, and the UI behind nginx on port 7860.
 
 
130
 
131
- ## Project Structure
132
 
133
  ```
134
  src/
135
  config.py # env-based configuration
136
- provider.py # create_llm() / create_embeddings() factory
137
  models.py # shared dataclasses
138
  ingestion/
139
  pdf_parser.py # PyMuPDF extraction
140
- text_cleaner.py # Danish/English normalization
141
  chunker.py # fixed-size, recursive, semantic chunking
142
  pipeline.py # ingestion orchestration
143
  retrieval/
@@ -152,7 +339,7 @@ src/
152
  agent/
153
  intent_classifier.py
154
  router.py # pipeline mode (AGENT_MODE=pipeline)
155
- tools.py # 6 retrieval tools + ToolResultStore
156
  plan_and_execute.py # Plan-and-Execute agent (AGENT_MODE=react)
157
  memory.py # conversation memory for multi-turn
158
  evaluation/
 
8
  noindex: true
9
  ---
10
 
11
+ # Dokumentassistent
12
 
13
+ ## Live demo
14
+ Hosted on Hugging Face Spaces: [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space)
15
 
16
+ [Skip to English ↓](#english)
17
 
18
+ ## Dansk
19
 
20
+ En produktionsklar RAG-applikation, der gør det muligt at stille spørgsmål til dokumenter på dansk og få svar med kildehenvisninger. Systemet er bygget på open source-komponenter (LangChain, LangGraph, Qdrant, Ollama) og kan køre helt lokalt uden eksterne API-kald. Det implementerer hybrid søgning med reranking, en Plan-and-Execute agent med samtalehukommelse, og RAGAS-baseret evaluering af svarkvaliteten.
21
 
22
+ ### Funktioner
23
 
24
+ | Område | Implementering |
25
+ |---|---|
26
+ | Ustruktureret data | PyMuPDF-parser, dansk og engelsk tekstrensning, tre opdelingsstrategier (fast størrelse, rekursiv, semantisk) |
27
+ | Hybrid søgning | Qdrant til vektorsøgning kombineret med BM25, flettet med reciprocal rank fusion |
28
+ | Reranking | Cross-encoder `mmarco-mMiniLMv2-L12-H384-v1` |
29
+ | Agent-flows | Plan-and-Execute med seks værktøjer, ReAct-subagent og samtalehukommelse |
30
+ | Evaluering | RAGAS-metrikker (faithfulness, answer relevancy, context precision) |
31
+ | Sporbarhed | Hvert svar har kildehenvisninger med chunk-ID og sidenummer, samt struktureret logning |
32
+ | Provider-abstraktion | Factory-mønster, der gør det muligt at skifte mellem Ollama, OpenAI, Azure OpenAI, Anthropic og Google GenAI uden at ændre forretningskoden |
33
+ | Deployment | Docker Compose til lokal kørsel, Hugging Face Spaces til den offentlige demo |
34
+
35
+ ### Sådan fungerer det
36
+
37
+ PDF-filer bliver læst med PyMuPDF, renset, opdelt i tekststykker (fast størrelse, rekursivt eller semantisk), indlejret med en flersproget sentence-transformer og gemt i Qdrant. Et BM25-indeks bygges på de samme tekststykker til nøgleordssøgning.
38
+
39
+ Når en bruger stiller et spørgsmål, kører systemet både den semantiske og den leksikale søgning, fletter resultaterne sammen med reciprocal rank fusion, og lader en cross-encoder rescore kandidaterne. De øverste tekststykker bliver sendt til en LLM, og svaret streames tilbage via SSE og vises i Streamlit-grænsefladen sammen med kilderne.
40
+
41
+ ### To agent-tilstande
42
+
43
+ Systemet kan køre i to forskellige tilstande, der vælges via miljøvariablen `AGENT_MODE`.
44
+
45
+ **Pipeline** (`AGENT_MODE=pipeline`) bygger på en fast LangGraph-graf med sprogdetektion, valgfri oversættelse, hybrid søgning, reranking og generering. Tilstanden har en confidence-baseret retry-loop og fungerer fint med lette lokale modeller.
46
+
47
+ **Plan-and-Execute Agent** (`AGENT_MODE=react`, standard) er en flertrinsagent, hvor en planner først nedbryder spørgsmålet i delopgaver, en executor kører hver delopgave gennem en ReAct-subagent med adgang til et sæt værktøjer, og en synthesizer producerer det endelige svar med kildehenvisninger. Tilstanden indeholder samtalehukommelse til opfølgende spørgsmål og kræver en model, der understøtter tool calling.
48
+
49
+ | Værktøj | Formål |
50
+ |---|---|
51
+ | `hybrid_search(query, top_k)` | Henter relevante tekststykker via hybrid søgning og reranking |
52
+ | `multi_query_search(question, top_k)` | Nedbryder komplekse spørgsmål i delspørgsmål, søger på hver og fletter resultaterne |
53
+ | `search_within_document(document_id, query, top_k)` | Finder bestemte afsnit i et kendt dokument |
54
+ | `summarize_document(document_id)` | Laver et struktureret resumé af et dokument |
55
+ | `list_documents()` | Viser hvilke dokumenter, der ligger i vidensbasen |
56
+ | `fetch_document(document_id)` | Henter et helt dokument |
57
+
58
+ ### Produktionshensyn
59
+
60
+ - **Sporbarhed.** Hvert genereret svar har kildehenvisninger på chunk-niveau med dokument-ID, sidenummer og tekststykke, så det kan revideres bagudrettet.
61
+ - **Governance.** RAGAS-evalueringspipelinen i `src/evaluation/` gør det muligt at måle faithfulness og context precision, før ændringer slippes løs i produktion.
62
+ - **Konfigurerbarhed.** Ingen hardkodede stier, modelnavne eller API-nøgler. Alt styres via miljøvariabler gennem `src/config.py`.
63
+ - **Provider-neutralitet.** Forretningskoden importerer aldrig en provider-SDK direkte. LLM- og embedding-backends skiftes via factory-funktionerne `create_llm()` og `create_embeddings()`, hvilket undgår vendor lock-in.
64
+ - **Lokal som standard.** Standardkonfigurationen kører helt uden eksterne API-kald og passer til miljøer med strenge krav til datahjemsted.
65
+ - **Pakket i containere.** Docker Compose til lokal kørsel og Hugging Face Spaces til den offentlige demo.
66
+
67
+ ### Teknologivalg
68
+
69
+ | Kategori | Teknologi |
70
+ |---|---|
71
+ | Framework | FastAPI, uvicorn |
72
+ | Orkestrering | LangChain, LangGraph |
73
+ | Vektorlager | Qdrant (lokal tilstand) |
74
+ | Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
75
+ | LLM | `gemma4:e4b` via Ollama som standard |
76
+ | Sparse-søgning | rank_bm25 |
77
+ | Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
78
+ | PDF-parsing | PyMuPDF |
79
+ | Evaluering | RAGAS |
80
+ | Grænseflade | Streamlit |
81
+
82
+ ### Provider-understøttelse
83
+
84
+ LLM- og embedding-backends konfigureres via miljøvariabler. De understøttede providers er Ollama, OpenAI, Azure OpenAI, Anthropic, Google GenAI og Groq. Standardopsætningen (Ollama og HuggingFace) kører helt lokalt uden API-nøgler.
85
+
86
+ Se `.env.example` for konfiguration pr. provider.
87
+
88
+ ### Prøv den live
89
+
90
+ Demoen ligger på [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space).
91
+
92
+ Prøv for eksempel disse spørgsmål på dansk.
93
+
94
+ - "Hvad er KU's politik for brug af AI-værktøjer?"
95
+ - "Hvilke regler gælder for brug af generativ AI i eksamen?"
96
+ - "Sammenlign reglerne for AI-brug i forskning og undervisning."
97
+
98
+ Det sidste spørgsmål udløser Plan-and-Execute-agenten, så man kan se den nedbryde spørgsmålet i delopgaver i realtid.
99
+
100
+ ### Kom i gang
101
+
102
+ Kræver Python 3.11+ og [Ollama](https://ollama.com/).
103
+
104
+ ```bash
105
+ git clone https://github.com/Xiiqiing/Dokumentassistent.git
106
+ cd Dokumentassistent
107
+ python -m venv .venv && source .venv/bin/activate
108
+ pip install -r requirements.txt
109
+ cp .env.example .env
110
+
111
+ ollama pull gemma4:e4b
112
+ python -m scripts.ingest # læg PDF-filer i docs/ først
113
+
114
+ uvicorn src.api.main:app --reload # http://localhost:8000
115
+ streamlit run src/ui/app.py # http://localhost:8501
116
+ ```
117
+
118
+ ### Docker
119
+
120
+ Docker Compose håndterer Qdrant, API'et og Streamlit-grænsefladen samlet. API-containeren venter på, at Qdrant er oppe, og kører ingestion automatisk, hvis samlingen er tom.
121
+
122
+ #### Lokalt setup med Ollama og HuggingFace
123
+
124
+ ```bash
125
+ cp .env.example .env
126
+ docker compose --profile local up --build
127
+ ```
128
+
129
+ | Service | URL |
130
+ |---|---|
131
+ | API | http://localhost:8000 |
132
+ | API-dokumentation | http://localhost:8000/docs |
133
+ | Streamlit-grænseflade | http://localhost:8501 |
134
+ | Qdrant-dashboard | http://localhost:6333/dashboard |
135
 
136
+ #### Cloud-setup med OpenAI, Anthropic eller andre
137
 
138
+ ```bash
139
+ cp .env.example .env
140
+ # sæt LLM_PROVIDER, EMBEDDING_PROVIDER og din API-nøgle
141
+ docker compose up --build
142
+ ```
143
 
144
+ #### Hugging Face Spaces
 
 
 
 
 
 
 
145
 
146
+ Et `Dockerfile` og en supervisor-konfiguration er inkluderet. Spacet kører Qdrant, API'et og grænsefladen bag nginx på port 7860.
147
+
148
+ ### Projektstruktur
149
+
150
+ ```
151
+ src/
152
+ config.py # konfiguration via miljøvariabler
153
+ provider.py # create_llm() og create_embeddings() factory
154
+ models.py # delte dataklasser
155
+ ingestion/
156
+ pdf_parser.py # PyMuPDF-udtræk
157
+ text_cleaner.py # dansk og engelsk normalisering
158
+ chunker.py # fast størrelse, rekursiv og semantisk opdeling
159
+ pipeline.py # ingestion-orkestrering
160
+ retrieval/
161
+ embedder.py
162
+ vector_store.py # Qdrant
163
+ bm25_search.py
164
+ hybrid.py # reciprocal rank fusion
165
+ reranker.py # cross-encoder
166
+ api/
167
+ main.py
168
+ routes.py # /query, /ingest, /health
169
+ agent/
170
+ intent_classifier.py
171
+ router.py # pipeline-tilstand (AGENT_MODE=pipeline)
172
+ tools.py # seks retrieval-værktøjer og ToolResultStore
173
+ plan_and_execute.py # Plan-and-Execute-agent (AGENT_MODE=react)
174
+ memory.py # samtalehukommelse til flere spørgsmål
175
+ evaluation/
176
+ evaluator.py # RAGAS-metrikker
177
+ ui/
178
+ app.py # Streamlit-frontend
179
+ scripts/
180
+ ingest.py
181
+ e2e_test.py
182
+ tests/
183
+ docs/ # eksempel-PDF'er eller tekster (KU AI-dokumenter)
184
+ ```
185
+
186
+ ---
187
+
188
+ ## English
189
+
190
+ A production-ready RAG application that lets users ask questions about documents in Danish and receive answers with source citations. The system is built on open source components (LangChain, LangGraph, Qdrant, Ollama) and can run fully local without any external API calls. It implements hybrid search with reranking, a Plan-and-Execute agent with conversation memory, and RAGAS-based evaluation of answer quality.
191
+
192
+ ### Capabilities
193
+
194
+ | Area | Implementation |
195
+ |---|---|
196
+ | Unstructured data | PyMuPDF parser, Danish and English text cleaning, three chunking strategies (fixed-size, recursive, semantic) |
197
+ | Hybrid retrieval | Qdrant dense vectors combined with BM25, fused via reciprocal rank fusion |
198
+ | Reranking | Cross-encoder `mmarco-mMiniLMv2-L12-H384-v1` |
199
+ | Agent flows | Plan-and-Execute with six tools, ReAct sub-agent and conversation memory |
200
+ | Evaluation | RAGAS metrics (faithfulness, answer relevancy, context precision) |
201
+ | Traceability | Each answer carries source references with chunk ID and page number, plus structured logging |
202
+ | Provider abstraction | Factory pattern that allows swapping between Ollama, OpenAI, Azure OpenAI, Anthropic and Google GenAI without touching business code |
203
+ | Deployment | Docker Compose for local setup, Hugging Face Spaces for the public demo |
204
+
205
+ ### How it works
206
+
207
+ PDFs are parsed with PyMuPDF, cleaned, split into chunks (fixed-size, recursive, or semantic), embedded with a multilingual sentence-transformer, and stored in Qdrant. A BM25 index is built from the same chunks for keyword search.
208
+
209
+ At query time, both indexes are searched and the results merged with reciprocal rank fusion. A cross-encoder then rescores the candidates before the top chunks are passed to the LLM. The API streams the response over SSE and the Streamlit UI displays it together with the sources.
210
+
211
+ ### Two agent modes
212
+
213
+ The system can run in two different modes, switchable via the `AGENT_MODE` environment variable.
214
+
215
+ **Pipeline** (`AGENT_MODE=pipeline`) is built on a fixed LangGraph graph with language detection, optional translation, hybrid retrieval, reranking, and generation. The mode has a confidence-based retry loop and works well with lightweight local models.
216
+
217
+ **Plan-and-Execute Agent** (`AGENT_MODE=react`, default) is a multi-step agent where a planner first decomposes the query into sub-tasks, an executor runs each sub-task through a ReAct sub-agent with access to a set of tools, and a synthesizer produces the final answer with citations. The mode includes conversation memory for follow-up questions and requires a model that supports tool calling.
218
+
219
+ | Tool | Purpose |
220
+ |---|---|
221
+ | `hybrid_search(query, top_k)` | Retrieves relevant passages via hybrid search and reranking |
222
+ | `multi_query_search(question, top_k)` | Decomposes complex questions into sub-queries, searches each, and merges the results |
223
+ | `search_within_document(document_id, query, top_k)` | Finds specific sections inside a known document |
224
+ | `summarize_document(document_id)` | Generates a structured summary of a document |
225
+ | `list_documents()` | Shows what is in the knowledge base |
226
+ | `fetch_document(document_id)` | Reads a full document |
227
+
228
+ ### Production considerations
229
+
230
+ - **Traceability.** Every generated answer carries chunk-level source references with document ID, page number and span, so it can be audited and reviewed afterwards.
231
+ - **Governance.** The RAGAS evaluation pipeline in `src/evaluation/` lets you measure faithfulness and context precision before promoting changes to production.
232
+ - **Configurability.** No hardcoded paths, model names or API keys. Everything is controlled via environment variables through `src/config.py`.
233
+ - **Provider neutrality.** Business code never imports a provider SDK directly. LLM and embedding backends swap via the `create_llm()` and `create_embeddings()` factory functions, which avoids vendor lock-in.
234
+ - **Local-first.** The default configuration runs entirely without external API calls and fits environments with strict data residency requirements.
235
+ - **Containerized.** Docker Compose for local runs and Hugging Face Spaces for the public demo.
236
+
237
+ ### Tech stack
238
 
239
  | Category | Technology |
240
  |---|---|
241
  | Framework | FastAPI, uvicorn |
242
  | Orchestration | LangChain, LangGraph |
243
+ | Vector store | Qdrant (local mode) |
244
  | Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
245
+ | LLM | `gemma4:e4b` via Ollama (default) |
246
+ | Sparse search | rank_bm25 |
247
  | Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
248
+ | PDF parsing | PyMuPDF |
249
  | Evaluation | RAGAS |
250
  | UI | Streamlit |
251
 
252
+ ### Provider support
253
 
254
+ LLM and embedding backends are configured through environment variables. Supported providers are Ollama, OpenAI, Azure OpenAI, Anthropic, Google GenAI and Groq. The default setup (Ollama and HuggingFace) runs entirely locally without any API keys.
255
 
256
  See `.env.example` for per-provider configuration.
257
 
258
+ ### Try it live
 
 
 
 
 
 
 
259
 
260
+ The demo lives at [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space).
261
 
262
+ Try asking these questions in Danish.
 
 
 
 
 
263
 
264
+ - "Hvad er KU's politik for brug af AI-værktøjer?"
265
+ - "Hvilke regler gælder for brug af generativ AI i eksamen?"
266
+ - "Sammenlign reglerne for AI-brug i forskning og undervisning."
267
 
268
+ The third question triggers the Plan-and-Execute agent, so you can watch it decompose the query into sub-tasks in real time.
 
 
 
 
269
 
270
+ ### Quick start
271
 
272
  Requires Python 3.11+ and [Ollama](https://ollama.com/).
273
 
 
281
  ollama pull gemma4:e4b
282
  python -m scripts.ingest # place PDFs in docs/ first
283
 
284
+ uvicorn src.api.main:app --reload # http://localhost:8000
285
+ streamlit run src/ui/app.py # http://localhost:8501
286
  ```
287
 
288
+ ### Docker
289
 
290
+ Docker Compose handles Qdrant, the API and the Streamlit UI together. The API container waits for Qdrant on startup and runs ingestion automatically if the collection is empty.
291
 
292
+ #### Local setup with Ollama and HuggingFace
293
 
294
  ```bash
295
  cp .env.example .env
 
303
  | Streamlit UI | http://localhost:8501 |
304
  | Qdrant dashboard | http://localhost:6333/dashboard |
305
 
306
+ #### Cloud setup with OpenAI, Anthropic or others
307
 
308
  ```bash
309
  cp .env.example .env
310
+ # set LLM_PROVIDER, EMBEDDING_PROVIDER and your API key
311
  docker compose up --build
312
  ```
313
 
314
+ #### Hugging Face Spaces
315
+
316
+ A `Dockerfile` and supervisor configuration are included. The Space runs Qdrant, the API and the UI behind nginx on port 7860.
317
 
318
+ ### Project structure
319
 
320
  ```
321
  src/
322
  config.py # env-based configuration
323
+ provider.py # create_llm() and create_embeddings() factory
324
  models.py # shared dataclasses
325
  ingestion/
326
  pdf_parser.py # PyMuPDF extraction
327
+ text_cleaner.py # Danish and English normalization
328
  chunker.py # fixed-size, recursive, semantic chunking
329
  pipeline.py # ingestion orchestration
330
  retrieval/
 
339
  agent/
340
  intent_classifier.py
341
  router.py # pipeline mode (AGENT_MODE=pipeline)
342
+ tools.py # six retrieval tools and ToolResultStore
343
  plan_and_execute.py # Plan-and-Execute agent (AGENT_MODE=react)
344
  memory.py # conversation memory for multi-turn
345
  evaluation/