Spaces:
Running
Running
XQ commited on
Commit ·
04082c4
1
Parent(s): bb91c88
Update README
Browse files- .github/README.md +244 -67
- README.md +245 -58
.github/README.md
CHANGED
|
@@ -1,88 +1,263 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
## Live
|
|
|
|
| 4 |
|
| 5 |
-
[
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
-
-
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|------|---------|
|
| 21 |
-
| `hybrid_search(query, top_k)` | Retrieve relevant passages via hybrid search + reranking |
|
| 22 |
-
| `multi_query_search(question, top_k)` | Decompose complex questions into sub-queries, search each, merge results |
|
| 23 |
-
| `search_within_document(document_id, query, top_k)` | Find specific sections inside a known document |
|
| 24 |
-
| `summarize_document(document_id)` | Generate a structured summary of a document |
|
| 25 |
-
| `list_documents()` | See what's in the knowledge base |
|
| 26 |
-
| `fetch_document(document_id)` | Read a full document |
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
-
|
| 31 |
|
| 32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|---|---|
|
| 34 |
| Framework | FastAPI, uvicorn |
|
| 35 |
-
|
|
| 36 |
-
|
|
| 37 |
| Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
|
| 38 |
-
| LLM | `gemma4` via Ollama
|
| 39 |
-
| Sparse
|
| 40 |
| Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
|
| 41 |
-
| PDF
|
| 42 |
-
|
|
| 43 |
-
|
|
| 44 |
|
| 45 |
-
## Provider
|
| 46 |
|
| 47 |
-
LLM
|
| 48 |
|
| 49 |
-
|
| 50 |
|
| 51 |
-
##
|
| 52 |
|
| 53 |
-
|
| 54 |
-
|------|-------------|-------|
|
| 55 |
-
| Plan-and-Execute | `react` (default) | Structured multi-step agent with conversation memory |
|
| 56 |
-
| Pipeline | `pipeline` | Predefined graph, works with lightweight models that lack tool calling |
|
| 57 |
|
| 58 |
-
|
| 59 |
|
| 60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
LLM_PROVIDER=ollama
|
| 65 |
-
OLLAMA_MODEL=gemma4:e4b
|
| 66 |
```
|
| 67 |
|
| 68 |
-
|
| 69 |
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
```
|
| 75 |
|
| 76 |
-
|
| 77 |
|
| 78 |
-
``
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
```
|
| 84 |
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
|
| 87 |
Requires Python 3.11+ and [Ollama](https://ollama.com/).
|
| 88 |
|
|
@@ -96,15 +271,15 @@ cp .env.example .env
|
|
| 96 |
ollama pull gemma4:e4b
|
| 97 |
python -m scripts.ingest # place PDFs in docs/ first
|
| 98 |
|
| 99 |
-
uvicorn src.api.main:app --reload #
|
| 100 |
-
streamlit run src/ui/app.py #
|
| 101 |
```
|
| 102 |
|
| 103 |
-
## Docker
|
| 104 |
|
| 105 |
-
Docker Compose handles Qdrant, the API
|
| 106 |
|
| 107 |
-
|
| 108 |
|
| 109 |
```bash
|
| 110 |
cp .env.example .env
|
|
@@ -118,26 +293,28 @@ docker compose --profile local up --build
|
|
| 118 |
| Streamlit UI | http://localhost:8501 |
|
| 119 |
| Qdrant dashboard | http://localhost:6333/dashboard |
|
| 120 |
|
| 121 |
-
|
| 122 |
|
| 123 |
```bash
|
| 124 |
cp .env.example .env
|
| 125 |
-
# set LLM_PROVIDER, EMBEDDING_PROVIDER
|
| 126 |
docker compose up --build
|
| 127 |
```
|
| 128 |
|
| 129 |
-
|
|
|
|
|
|
|
| 130 |
|
| 131 |
-
## Project
|
| 132 |
|
| 133 |
```
|
| 134 |
src/
|
| 135 |
config.py # env-based configuration
|
| 136 |
-
provider.py # create_llm()
|
| 137 |
models.py # shared dataclasses
|
| 138 |
ingestion/
|
| 139 |
pdf_parser.py # PyMuPDF extraction
|
| 140 |
-
text_cleaner.py # Danish
|
| 141 |
chunker.py # fixed-size, recursive, semantic chunking
|
| 142 |
pipeline.py # ingestion orchestration
|
| 143 |
retrieval/
|
|
@@ -148,11 +325,11 @@ src/
|
|
| 148 |
reranker.py # cross-encoder
|
| 149 |
api/
|
| 150 |
main.py
|
| 151 |
-
routes.py # /query, /
|
| 152 |
agent/
|
| 153 |
intent_classifier.py
|
| 154 |
router.py # pipeline mode (AGENT_MODE=pipeline)
|
| 155 |
-
tools.py #
|
| 156 |
plan_and_execute.py # Plan-and-Execute agent (AGENT_MODE=react)
|
| 157 |
memory.py # conversation memory for multi-turn
|
| 158 |
evaluation/
|
|
@@ -163,5 +340,5 @@ scripts/
|
|
| 163 |
ingest.py
|
| 164 |
e2e_test.py
|
| 165 |
tests/
|
| 166 |
-
docs/ # example PDFs
|
| 167 |
```
|
|
|
|
| 1 |
+
# Dokumentassistent
|
| 2 |
|
| 3 |
+
## Live demo
|
| 4 |
+
Hosted on Hugging Face Spaces: [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space)
|
| 5 |
|
| 6 |
+
[Skip to English ↓](#english)
|
| 7 |
|
| 8 |
+
## Dansk
|
| 9 |
|
| 10 |
+
En produktionsklar RAG-applikation, der gør det muligt at stille spørgsmål til dokumenter på dansk og få svar med kildehenvisninger. Systemet er bygget på open source-komponenter (LangChain, LangGraph, Qdrant, Ollama) og kan køre helt lokalt uden eksterne API-kald. Det implementerer hybrid søgning med reranking, en Plan-and-Execute agent med samtalehukommelse, og RAGAS-baseret evaluering af svarkvaliteten.
|
| 11 |
|
| 12 |
+
### Funktioner
|
| 13 |
|
| 14 |
+
| Område | Implementering |
|
| 15 |
+
|---|---|
|
| 16 |
+
| Ustruktureret data | PyMuPDF-parser, dansk og engelsk tekstrensning, tre opdelingsstrategier (fast størrelse, rekursiv, semantisk) |
|
| 17 |
+
| Hybrid søgning | Qdrant til vektorsøgning kombineret med BM25, flettet med reciprocal rank fusion |
|
| 18 |
+
| Reranking | Cross-encoder `mmarco-mMiniLMv2-L12-H384-v1` |
|
| 19 |
+
| Agent-flows | Plan-and-Execute med seks værktøjer, ReAct-subagent og samtalehukommelse |
|
| 20 |
+
| Evaluering | RAGAS-metrikker (faithfulness, answer relevancy, context precision) |
|
| 21 |
+
| Sporbarhed | Hvert svar har kildehenvisninger med chunk-ID og sidenummer, samt struktureret logning |
|
| 22 |
+
| Provider-abstraktion | Factory-mønster, der gør det muligt at skifte mellem Ollama, OpenAI, Azure OpenAI, Anthropic og Google GenAI uden at ændre forretningskoden |
|
| 23 |
+
| Deployment | Docker Compose til lokal kørsel, Hugging Face Spaces til den offentlige demo |
|
| 24 |
|
| 25 |
+
### Sådan fungerer det
|
| 26 |
|
| 27 |
+
PDF-filer bliver læst med PyMuPDF, renset, opdelt i tekststykker (fast størrelse, rekursivt eller semantisk), indlejret med en flersproget sentence-transformer og gemt i Qdrant. Et BM25-indeks bygges på de samme tekststykker til nøgleordssøgning.
|
| 28 |
|
| 29 |
+
Når en bruger stiller et spørgsmål, kører systemet både den semantiske og den leksikale søgning, fletter resultaterne sammen med reciprocal rank fusion, og lader en cross-encoder rescore kandidaterne. De øverste tekststykker bliver sendt til en LLM, og svaret streames tilbage via SSE og vises i Streamlit-grænsefladen sammen med kilderne.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
### To agent-tilstande
|
| 32 |
|
| 33 |
+
Systemet kan køre i to forskellige tilstande, der vælges via miljøvariablen `AGENT_MODE`.
|
| 34 |
|
| 35 |
+
**Pipeline** (`AGENT_MODE=pipeline`) bygger på en fast LangGraph-graf med sprogdetektion, valgfri oversættelse, hybrid søgning, reranking og generering. Tilstanden har en confidence-baseret retry-loop og fungerer fint med lette lokale modeller.
|
| 36 |
+
|
| 37 |
+
**Plan-and-Execute Agent** (`AGENT_MODE=react`, standard) er en flertrinsagent, hvor en planner først nedbryder spørgsmålet i delopgaver, en executor kører hver delopgave gennem en ReAct-subagent med adgang til et sæt værktøjer, og en synthesizer producerer det endelige svar med kildehenvisninger. Tilstanden indeholder samtalehukommelse til opfølgende spørgsmål og kræver en model, der understøtter tool calling.
|
| 38 |
+
|
| 39 |
+
| Værktøj | Formål |
|
| 40 |
+
|---|---|
|
| 41 |
+
| `hybrid_search(query, top_k)` | Henter relevante tekststykker via hybrid søgning og reranking |
|
| 42 |
+
| `multi_query_search(question, top_k)` | Nedbryder komplekse spørgsmål i delspørgsmål, søger på hver og fletter resultaterne |
|
| 43 |
+
| `search_within_document(document_id, query, top_k)` | Finder bestemte afsnit i et kendt dokument |
|
| 44 |
+
| `summarize_document(document_id)` | Laver et struktureret resumé af et dokument |
|
| 45 |
+
| `list_documents()` | Viser hvilke dokumenter, der ligger i vidensbasen |
|
| 46 |
+
| `fetch_document(document_id)` | Henter et helt dokument |
|
| 47 |
+
|
| 48 |
+
### Produktionshensyn
|
| 49 |
+
|
| 50 |
+
- **Sporbarhed.** Hvert genereret svar har kildehenvisninger på chunk-niveau med dokument-ID, sidenummer og tekststykke, så det kan revideres bagudrettet.
|
| 51 |
+
- **Governance.** RAGAS-evalueringspipelinen i `src/evaluation/` gør det muligt at måle faithfulness og context precision, før ændringer slippes løs i produktion.
|
| 52 |
+
- **Konfigurerbarhed.** Ingen hardkodede stier, modelnavne eller API-nøgler. Alt styres via miljøvariabler gennem `src/config.py`.
|
| 53 |
+
- **Provider-neutralitet.** Forretningskoden importerer aldrig en provider-SDK direkte. LLM- og embedding-backends skiftes via factory-funktionerne `create_llm()` og `create_embeddings()`, hvilket undgår vendor lock-in.
|
| 54 |
+
- **Lokal som standard.** Standardkonfigurationen kører helt uden eksterne API-kald og passer til miljøer med strenge krav til datahjemsted.
|
| 55 |
+
- **Pakket i containere.** Docker Compose til lokal kørsel og Hugging Face Spaces til den offentlige demo.
|
| 56 |
+
|
| 57 |
+
### Teknologivalg
|
| 58 |
+
|
| 59 |
+
| Kategori | Teknologi |
|
| 60 |
|---|---|
|
| 61 |
| Framework | FastAPI, uvicorn |
|
| 62 |
+
| Orkestrering | LangChain, LangGraph |
|
| 63 |
+
| Vektorlager | Qdrant (lokal tilstand) |
|
| 64 |
| Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
|
| 65 |
+
| LLM | `gemma4:e4b` via Ollama som standard |
|
| 66 |
+
| Sparse-søgning | rank_bm25 |
|
| 67 |
| Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
|
| 68 |
+
| PDF-parsing | PyMuPDF |
|
| 69 |
+
| Evaluering | RAGAS |
|
| 70 |
+
| Grænseflade | Streamlit |
|
| 71 |
|
| 72 |
+
### Provider-understøttelse
|
| 73 |
|
| 74 |
+
LLM- og embedding-backends konfigureres via miljøvariabler. De understøttede providers er Ollama, OpenAI, Azure OpenAI, Anthropic, Google GenAI og Groq. Standardopsætningen (Ollama og HuggingFace) kører helt lokalt uden API-nøgler.
|
| 75 |
|
| 76 |
+
Se `.env.example` for konfiguration pr. provider.
|
| 77 |
|
| 78 |
+
### Prøv den live
|
| 79 |
|
| 80 |
+
Demoen ligger på [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space).
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
+
Prøv for eksempel disse spørgsmål på dansk.
|
| 83 |
|
| 84 |
+
- "Hvad er KU's politik for brug af AI-værktøjer?"
|
| 85 |
+
- "Hvilke regler gælder for brug af generativ AI i eksamen?"
|
| 86 |
+
- "Sammenlign reglerne for AI-brug i forskning og undervisning."
|
| 87 |
+
|
| 88 |
+
Det sidste spørgsmål udløser Plan-and-Execute-agenten, så man kan se den nedbryde spørgsmålet i delopgaver i realtid.
|
| 89 |
+
|
| 90 |
+
### Kom i gang
|
| 91 |
+
|
| 92 |
+
Kræver Python 3.11+ og [Ollama](https://ollama.com/).
|
| 93 |
+
|
| 94 |
+
```bash
|
| 95 |
+
git clone https://github.com/Xiiqiing/Dokumentassistent.git
|
| 96 |
+
cd Dokumentassistent
|
| 97 |
+
python -m venv .venv && source .venv/bin/activate
|
| 98 |
+
pip install -r requirements.txt
|
| 99 |
+
cp .env.example .env
|
| 100 |
+
|
| 101 |
+
ollama pull gemma4:e4b
|
| 102 |
+
python -m scripts.ingest # læg PDF-filer i docs/ først
|
| 103 |
|
| 104 |
+
uvicorn src.api.main:app --reload # http://localhost:8000
|
| 105 |
+
streamlit run src/ui/app.py # http://localhost:8501
|
|
|
|
|
|
|
| 106 |
```
|
| 107 |
|
| 108 |
+
### Docker
|
| 109 |
|
| 110 |
+
Docker Compose håndterer Qdrant, API'et og Streamlit-grænsefladen samlet. API-containeren venter på, at Qdrant er oppe, og kører ingestion automatisk, hvis samlingen er tom.
|
| 111 |
+
|
| 112 |
+
#### Lokalt setup med Ollama og HuggingFace
|
| 113 |
+
|
| 114 |
+
```bash
|
| 115 |
+
cp .env.example .env
|
| 116 |
+
docker compose --profile local up --build
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
| Service | URL |
|
| 120 |
+
|---|---|
|
| 121 |
+
| API | http://localhost:8000 |
|
| 122 |
+
| API-dokumentation | http://localhost:8000/docs |
|
| 123 |
+
| Streamlit-grænseflade | http://localhost:8501 |
|
| 124 |
+
| Qdrant-dashboard | http://localhost:6333/dashboard |
|
| 125 |
+
|
| 126 |
+
#### Cloud-setup med OpenAI, Anthropic eller andre
|
| 127 |
+
|
| 128 |
+
```bash
|
| 129 |
+
cp .env.example .env
|
| 130 |
+
# sæt LLM_PROVIDER, EMBEDDING_PROVIDER og din API-nøgle
|
| 131 |
+
docker compose up --build
|
| 132 |
```
|
| 133 |
|
| 134 |
+
#### Hugging Face Spaces
|
| 135 |
|
| 136 |
+
Et `Dockerfile` og en supervisor-konfiguration er inkluderet. Spacet kører Qdrant, API'et og grænsefladen bag nginx på port 7860.
|
| 137 |
+
|
| 138 |
+
### Projektstruktur
|
| 139 |
+
|
| 140 |
+
```
|
| 141 |
+
src/
|
| 142 |
+
config.py # konfiguration via miljøvariabler
|
| 143 |
+
provider.py # create_llm() og create_embeddings() factory
|
| 144 |
+
models.py # delte dataklasser
|
| 145 |
+
ingestion/
|
| 146 |
+
pdf_parser.py # PyMuPDF-udtræk
|
| 147 |
+
text_cleaner.py # dansk og engelsk normalisering
|
| 148 |
+
chunker.py # fast størrelse, rekursiv og semantisk opdeling
|
| 149 |
+
pipeline.py # ingestion-orkestrering
|
| 150 |
+
retrieval/
|
| 151 |
+
embedder.py
|
| 152 |
+
vector_store.py # Qdrant
|
| 153 |
+
bm25_search.py
|
| 154 |
+
hybrid.py # reciprocal rank fusion
|
| 155 |
+
reranker.py # cross-encoder
|
| 156 |
+
api/
|
| 157 |
+
main.py
|
| 158 |
+
routes.py # /query, /ingest, /health
|
| 159 |
+
agent/
|
| 160 |
+
intent_classifier.py
|
| 161 |
+
router.py # pipeline-tilstand (AGENT_MODE=pipeline)
|
| 162 |
+
tools.py # seks retrieval-værktøjer og ToolResultStore
|
| 163 |
+
plan_and_execute.py # Plan-and-Execute-agent (AGENT_MODE=react)
|
| 164 |
+
memory.py # samtalehukommelse til flere spørgsmål
|
| 165 |
+
evaluation/
|
| 166 |
+
evaluator.py # RAGAS-metrikker
|
| 167 |
+
ui/
|
| 168 |
+
app.py # Streamlit-frontend
|
| 169 |
+
scripts/
|
| 170 |
+
ingest.py
|
| 171 |
+
e2e_test.py
|
| 172 |
+
tests/
|
| 173 |
+
docs/ # eksempel-PDF'er eller tekster (KU AI-dokumenter)
|
| 174 |
```
|
| 175 |
|
| 176 |
+
---
|
| 177 |
+
|
| 178 |
+
## English
|
| 179 |
+
|
| 180 |
+
A production-ready RAG application that lets users ask questions about documents in Danish and receive answers with source citations. The system is built on open source components (LangChain, LangGraph, Qdrant, Ollama) and can run fully local without any external API calls. It implements hybrid search with reranking, a Plan-and-Execute agent with conversation memory, and RAGAS-based evaluation of answer quality.
|
| 181 |
+
|
| 182 |
+
### Capabilities
|
| 183 |
+
|
| 184 |
+
| Area | Implementation |
|
| 185 |
+
|---|---|
|
| 186 |
+
| Unstructured data | PyMuPDF parser, Danish and English text cleaning, three chunking strategies (fixed-size, recursive, semantic) |
|
| 187 |
+
| Hybrid retrieval | Qdrant dense vectors combined with BM25, fused via reciprocal rank fusion |
|
| 188 |
+
| Reranking | Cross-encoder `mmarco-mMiniLMv2-L12-H384-v1` |
|
| 189 |
+
| Agent flows | Plan-and-Execute with six tools, ReAct sub-agent and conversation memory |
|
| 190 |
+
| Evaluation | RAGAS metrics (faithfulness, answer relevancy, context precision) |
|
| 191 |
+
| Traceability | Each answer carries source references with chunk ID and page number, plus structured logging |
|
| 192 |
+
| Provider abstraction | Factory pattern that allows swapping between Ollama, OpenAI, Azure OpenAI, Anthropic and Google GenAI without touching business code |
|
| 193 |
+
| Deployment | Docker Compose for local setup, Hugging Face Spaces for the public demo |
|
| 194 |
+
|
| 195 |
+
### How it works
|
| 196 |
+
|
| 197 |
+
PDFs are parsed with PyMuPDF, cleaned, split into chunks (fixed-size, recursive, or semantic), embedded with a multilingual sentence-transformer, and stored in Qdrant. A BM25 index is built from the same chunks for keyword search.
|
| 198 |
+
|
| 199 |
+
At query time, both indexes are searched and the results merged with reciprocal rank fusion. A cross-encoder then rescores the candidates before the top chunks are passed to the LLM. The API streams the response over SSE and the Streamlit UI displays it together with the sources.
|
| 200 |
+
|
| 201 |
+
### Two agent modes
|
| 202 |
+
|
| 203 |
+
The system can run in two different modes, switchable via the `AGENT_MODE` environment variable.
|
| 204 |
+
|
| 205 |
+
**Pipeline** (`AGENT_MODE=pipeline`) is built on a fixed LangGraph graph with language detection, optional translation, hybrid retrieval, reranking, and generation. The mode has a confidence-based retry loop and works well with lightweight local models.
|
| 206 |
+
|
| 207 |
+
**Plan-and-Execute Agent** (`AGENT_MODE=react`, default) is a multi-step agent where a planner first decomposes the query into sub-tasks, an executor runs each sub-task through a ReAct sub-agent with access to a set of tools, and a synthesizer produces the final answer with citations. The mode includes conversation memory for follow-up questions and requires a model that supports tool calling.
|
| 208 |
+
|
| 209 |
+
| Tool | Purpose |
|
| 210 |
+
|---|---|
|
| 211 |
+
| `hybrid_search(query, top_k)` | Retrieves relevant passages via hybrid search and reranking |
|
| 212 |
+
| `multi_query_search(question, top_k)` | Decomposes complex questions into sub-queries, searches each, and merges the results |
|
| 213 |
+
| `search_within_document(document_id, query, top_k)` | Finds specific sections inside a known document |
|
| 214 |
+
| `summarize_document(document_id)` | Generates a structured summary of a document |
|
| 215 |
+
| `list_documents()` | Shows what is in the knowledge base |
|
| 216 |
+
| `fetch_document(document_id)` | Reads a full document |
|
| 217 |
+
|
| 218 |
+
### Production considerations
|
| 219 |
+
|
| 220 |
+
- **Traceability.** Every generated answer carries chunk-level source references with document ID, page number and span, so it can be audited and reviewed afterwards.
|
| 221 |
+
- **Governance.** The RAGAS evaluation pipeline in `src/evaluation/` lets you measure faithfulness and context precision before promoting changes to production.
|
| 222 |
+
- **Configurability.** No hardcoded paths, model names or API keys. Everything is controlled via environment variables through `src/config.py`.
|
| 223 |
+
- **Provider neutrality.** Business code never imports a provider SDK directly. LLM and embedding backends swap via the `create_llm()` and `create_embeddings()` factory functions, which avoids vendor lock-in.
|
| 224 |
+
- **Local-first.** The default configuration runs entirely without external API calls and fits environments with strict data residency requirements.
|
| 225 |
+
- **Containerized.** Docker Compose for local runs and Hugging Face Spaces for the public demo.
|
| 226 |
+
|
| 227 |
+
### Tech stack
|
| 228 |
+
|
| 229 |
+
| Category | Technology |
|
| 230 |
+
|---|---|
|
| 231 |
+
| Framework | FastAPI, uvicorn |
|
| 232 |
+
| Orchestration | LangChain, LangGraph |
|
| 233 |
+
| Vector store | Qdrant (local mode) |
|
| 234 |
+
| Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
|
| 235 |
+
| LLM | `gemma4:e4b` via Ollama (default) |
|
| 236 |
+
| Sparse search | rank_bm25 |
|
| 237 |
+
| Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
|
| 238 |
+
| PDF parsing | PyMuPDF |
|
| 239 |
+
| Evaluation | RAGAS |
|
| 240 |
+
| UI | Streamlit |
|
| 241 |
+
|
| 242 |
+
### Provider support
|
| 243 |
+
|
| 244 |
+
LLM and embedding backends are configured through environment variables. Supported providers are Ollama, OpenAI, Azure OpenAI, Anthropic, Google GenAI and Groq. The default setup (Ollama and HuggingFace) runs entirely locally without any API keys.
|
| 245 |
+
|
| 246 |
+
See `.env.example` for per-provider configuration.
|
| 247 |
+
|
| 248 |
+
### Try it live
|
| 249 |
+
|
| 250 |
+
The demo lives at [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space).
|
| 251 |
+
|
| 252 |
+
Try asking these questions in Danish.
|
| 253 |
+
|
| 254 |
+
- "Hvad er KU's politik for brug af AI-værktøjer?"
|
| 255 |
+
- "Hvilke regler gælder for brug af generativ AI i eksamen?"
|
| 256 |
+
- "Sammenlign reglerne for AI-brug i forskning og undervisning."
|
| 257 |
+
|
| 258 |
+
The third question triggers the Plan-and-Execute agent, so you can watch it decompose the query into sub-tasks in real time.
|
| 259 |
+
|
| 260 |
+
### Quick start
|
| 261 |
|
| 262 |
Requires Python 3.11+ and [Ollama](https://ollama.com/).
|
| 263 |
|
|
|
|
| 271 |
ollama pull gemma4:e4b
|
| 272 |
python -m scripts.ingest # place PDFs in docs/ first
|
| 273 |
|
| 274 |
+
uvicorn src.api.main:app --reload # http://localhost:8000
|
| 275 |
+
streamlit run src/ui/app.py # http://localhost:8501
|
| 276 |
```
|
| 277 |
|
| 278 |
+
### Docker
|
| 279 |
|
| 280 |
+
Docker Compose handles Qdrant, the API and the Streamlit UI together. The API container waits for Qdrant on startup and runs ingestion automatically if the collection is empty.
|
| 281 |
|
| 282 |
+
#### Local setup with Ollama and HuggingFace
|
| 283 |
|
| 284 |
```bash
|
| 285 |
cp .env.example .env
|
|
|
|
| 293 |
| Streamlit UI | http://localhost:8501 |
|
| 294 |
| Qdrant dashboard | http://localhost:6333/dashboard |
|
| 295 |
|
| 296 |
+
#### Cloud setup with OpenAI, Anthropic or others
|
| 297 |
|
| 298 |
```bash
|
| 299 |
cp .env.example .env
|
| 300 |
+
# set LLM_PROVIDER, EMBEDDING_PROVIDER and your API key
|
| 301 |
docker compose up --build
|
| 302 |
```
|
| 303 |
|
| 304 |
+
#### Hugging Face Spaces
|
| 305 |
+
|
| 306 |
+
A `Dockerfile` and supervisor configuration are included. The Space runs Qdrant, the API and the UI behind nginx on port 7860.
|
| 307 |
|
| 308 |
+
### Project structure
|
| 309 |
|
| 310 |
```
|
| 311 |
src/
|
| 312 |
config.py # env-based configuration
|
| 313 |
+
provider.py # create_llm() and create_embeddings() factory
|
| 314 |
models.py # shared dataclasses
|
| 315 |
ingestion/
|
| 316 |
pdf_parser.py # PyMuPDF extraction
|
| 317 |
+
text_cleaner.py # Danish and English normalization
|
| 318 |
chunker.py # fixed-size, recursive, semantic chunking
|
| 319 |
pipeline.py # ingestion orchestration
|
| 320 |
retrieval/
|
|
|
|
| 325 |
reranker.py # cross-encoder
|
| 326 |
api/
|
| 327 |
main.py
|
| 328 |
+
routes.py # /query, /ingest, /health
|
| 329 |
agent/
|
| 330 |
intent_classifier.py
|
| 331 |
router.py # pipeline mode (AGENT_MODE=pipeline)
|
| 332 |
+
tools.py # six retrieval tools and ToolResultStore
|
| 333 |
plan_and_execute.py # Plan-and-Execute agent (AGENT_MODE=react)
|
| 334 |
memory.py # conversation memory for multi-turn
|
| 335 |
evaluation/
|
|
|
|
| 340 |
ingest.py
|
| 341 |
e2e_test.py
|
| 342 |
tests/
|
| 343 |
+
docs/ # example PDFs or texts (KU AI public documents)
|
| 344 |
```
|
README.md
CHANGED
|
@@ -8,81 +8,266 @@ app_port: 7860
|
|
| 8 |
noindex: true
|
| 9 |
---
|
| 10 |
|
| 11 |
-
#
|
| 12 |
|
| 13 |
-
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
-
##
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
-
-
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|------|---------|
|
| 31 |
-
| `hybrid_search(query, top_k)` | Retrieve relevant passages via hybrid search + reranking |
|
| 32 |
-
| `multi_query_search(question, top_k)` | Decompose complex questions into sub-queries, search each, merge results |
|
| 33 |
-
| `search_within_document(document_id, query, top_k)` | Find specific sections inside a known document |
|
| 34 |
-
| `summarize_document(document_id)` | Generate a structured summary of a document |
|
| 35 |
-
| `list_documents()` | See what's in the knowledge base |
|
| 36 |
-
| `fetch_document(document_id)` | Read a full document |
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
| Category | Technology |
|
| 41 |
|---|---|
|
| 42 |
| Framework | FastAPI, uvicorn |
|
| 43 |
| Orchestration | LangChain, LangGraph |
|
| 44 |
-
| Vector
|
| 45 |
| Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
|
| 46 |
-
| LLM | `gemma4` via Ollama (default) |
|
| 47 |
-
| Sparse
|
| 48 |
| Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
|
| 49 |
-
| PDF
|
| 50 |
| Evaluation | RAGAS |
|
| 51 |
| UI | Streamlit |
|
| 52 |
|
| 53 |
-
## Provider
|
| 54 |
|
| 55 |
-
LLM and embedding backends are configured through environment variables. Supported providers
|
| 56 |
|
| 57 |
See `.env.example` for per-provider configuration.
|
| 58 |
|
| 59 |
-
##
|
| 60 |
-
|
| 61 |
-
| Mode | `AGENT_MODE` | Notes |
|
| 62 |
-
|------|-------------|-------|
|
| 63 |
-
| Pipeline | `pipeline` | Predefined graph, works with lightweight models |
|
| 64 |
-
| Plan-and-Execute (default) | `react` | Structured multi-step agent with conversation memory |
|
| 65 |
-
|
| 66 |
-
Tool-calling is supported by OpenAI, Anthropic, Google GenAI, Azure OpenAI, Groq, and some Ollama models (`llama3.1`, `qwen2.5`, `mistral-nemo`).
|
| 67 |
|
| 68 |
-
|
| 69 |
|
| 70 |
-
|
| 71 |
-
AGENT_MODE=react
|
| 72 |
-
LLM_PROVIDER=openai
|
| 73 |
-
OPENAI_API_KEY=sk-...
|
| 74 |
-
OPENAI_MODEL=gpt-4o-mini
|
| 75 |
-
```
|
| 76 |
|
| 77 |
-
|
|
|
|
|
|
|
| 78 |
|
| 79 |
-
|
| 80 |
-
AGENT_MODE=pipeline
|
| 81 |
-
LLM_PROVIDER=ollama
|
| 82 |
-
OLLAMA_MODEL=gemma3
|
| 83 |
-
```
|
| 84 |
|
| 85 |
-
## Quick
|
| 86 |
|
| 87 |
Requires Python 3.11+ and [Ollama](https://ollama.com/).
|
| 88 |
|
|
@@ -96,15 +281,15 @@ cp .env.example .env
|
|
| 96 |
ollama pull gemma4:e4b
|
| 97 |
python -m scripts.ingest # place PDFs in docs/ first
|
| 98 |
|
| 99 |
-
uvicorn src.api.main:app --reload #
|
| 100 |
-
streamlit run src/ui/app.py #
|
| 101 |
```
|
| 102 |
|
| 103 |
-
## Docker
|
| 104 |
|
| 105 |
-
Docker Compose handles Qdrant, the API
|
| 106 |
|
| 107 |
-
|
| 108 |
|
| 109 |
```bash
|
| 110 |
cp .env.example .env
|
|
@@ -118,26 +303,28 @@ docker compose --profile local up --build
|
|
| 118 |
| Streamlit UI | http://localhost:8501 |
|
| 119 |
| Qdrant dashboard | http://localhost:6333/dashboard |
|
| 120 |
|
| 121 |
-
|
| 122 |
|
| 123 |
```bash
|
| 124 |
cp .env.example .env
|
| 125 |
-
# set LLM_PROVIDER, EMBEDDING_PROVIDER
|
| 126 |
docker compose up --build
|
| 127 |
```
|
| 128 |
|
| 129 |
-
|
|
|
|
|
|
|
| 130 |
|
| 131 |
-
## Project
|
| 132 |
|
| 133 |
```
|
| 134 |
src/
|
| 135 |
config.py # env-based configuration
|
| 136 |
-
provider.py # create_llm()
|
| 137 |
models.py # shared dataclasses
|
| 138 |
ingestion/
|
| 139 |
pdf_parser.py # PyMuPDF extraction
|
| 140 |
-
text_cleaner.py # Danish
|
| 141 |
chunker.py # fixed-size, recursive, semantic chunking
|
| 142 |
pipeline.py # ingestion orchestration
|
| 143 |
retrieval/
|
|
@@ -152,7 +339,7 @@ src/
|
|
| 152 |
agent/
|
| 153 |
intent_classifier.py
|
| 154 |
router.py # pipeline mode (AGENT_MODE=pipeline)
|
| 155 |
-
tools.py #
|
| 156 |
plan_and_execute.py # Plan-and-Execute agent (AGENT_MODE=react)
|
| 157 |
memory.py # conversation memory for multi-turn
|
| 158 |
evaluation/
|
|
|
|
| 8 |
noindex: true
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# Dokumentassistent
|
| 12 |
|
| 13 |
+
## Live demo
|
| 14 |
+
Hosted on Hugging Face Spaces: [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space)
|
| 15 |
|
| 16 |
+
[Skip to English ↓](#english)
|
| 17 |
|
| 18 |
+
## Dansk
|
| 19 |
|
| 20 |
+
En produktionsklar RAG-applikation, der gør det muligt at stille spørgsmål til dokumenter på dansk og få svar med kildehenvisninger. Systemet er bygget på open source-komponenter (LangChain, LangGraph, Qdrant, Ollama) og kan køre helt lokalt uden eksterne API-kald. Det implementerer hybrid søgning med reranking, en Plan-and-Execute agent med samtalehukommelse, og RAGAS-baseret evaluering af svarkvaliteten.
|
| 21 |
|
| 22 |
+
### Funktioner
|
| 23 |
|
| 24 |
+
| Område | Implementering |
|
| 25 |
+
|---|---|
|
| 26 |
+
| Ustruktureret data | PyMuPDF-parser, dansk og engelsk tekstrensning, tre opdelingsstrategier (fast størrelse, rekursiv, semantisk) |
|
| 27 |
+
| Hybrid søgning | Qdrant til vektorsøgning kombineret med BM25, flettet med reciprocal rank fusion |
|
| 28 |
+
| Reranking | Cross-encoder `mmarco-mMiniLMv2-L12-H384-v1` |
|
| 29 |
+
| Agent-flows | Plan-and-Execute med seks værktøjer, ReAct-subagent og samtalehukommelse |
|
| 30 |
+
| Evaluering | RAGAS-metrikker (faithfulness, answer relevancy, context precision) |
|
| 31 |
+
| Sporbarhed | Hvert svar har kildehenvisninger med chunk-ID og sidenummer, samt struktureret logning |
|
| 32 |
+
| Provider-abstraktion | Factory-mønster, der gør det muligt at skifte mellem Ollama, OpenAI, Azure OpenAI, Anthropic og Google GenAI uden at ændre forretningskoden |
|
| 33 |
+
| Deployment | Docker Compose til lokal kørsel, Hugging Face Spaces til den offentlige demo |
|
| 34 |
+
|
| 35 |
+
### Sådan fungerer det
|
| 36 |
+
|
| 37 |
+
PDF-filer bliver læst med PyMuPDF, renset, opdelt i tekststykker (fast størrelse, rekursivt eller semantisk), indlejret med en flersproget sentence-transformer og gemt i Qdrant. Et BM25-indeks bygges på de samme tekststykker til nøgleordssøgning.
|
| 38 |
+
|
| 39 |
+
Når en bruger stiller et spørgsmål, kører systemet både den semantiske og den leksikale søgning, fletter resultaterne sammen med reciprocal rank fusion, og lader en cross-encoder rescore kandidaterne. De øverste tekststykker bliver sendt til en LLM, og svaret streames tilbage via SSE og vises i Streamlit-grænsefladen sammen med kilderne.
|
| 40 |
+
|
| 41 |
+
### To agent-tilstande
|
| 42 |
+
|
| 43 |
+
Systemet kan køre i to forskellige tilstande, der vælges via miljøvariablen `AGENT_MODE`.
|
| 44 |
+
|
| 45 |
+
**Pipeline** (`AGENT_MODE=pipeline`) bygger på en fast LangGraph-graf med sprogdetektion, valgfri oversættelse, hybrid søgning, reranking og generering. Tilstanden har en confidence-baseret retry-loop og fungerer fint med lette lokale modeller.
|
| 46 |
+
|
| 47 |
+
**Plan-and-Execute Agent** (`AGENT_MODE=react`, standard) er en flertrinsagent, hvor en planner først nedbryder spørgsmålet i delopgaver, en executor kører hver delopgave gennem en ReAct-subagent med adgang til et sæt værktøjer, og en synthesizer producerer det endelige svar med kildehenvisninger. Tilstanden indeholder samtalehukommelse til opfølgende spørgsmål og kræver en model, der understøtter tool calling.
|
| 48 |
+
|
| 49 |
+
| Værktøj | Formål |
|
| 50 |
+
|---|---|
|
| 51 |
+
| `hybrid_search(query, top_k)` | Henter relevante tekststykker via hybrid søgning og reranking |
|
| 52 |
+
| `multi_query_search(question, top_k)` | Nedbryder komplekse spørgsmål i delspørgsmål, søger på hver og fletter resultaterne |
|
| 53 |
+
| `search_within_document(document_id, query, top_k)` | Finder bestemte afsnit i et kendt dokument |
|
| 54 |
+
| `summarize_document(document_id)` | Laver et struktureret resumé af et dokument |
|
| 55 |
+
| `list_documents()` | Viser hvilke dokumenter, der ligger i vidensbasen |
|
| 56 |
+
| `fetch_document(document_id)` | Henter et helt dokument |
|
| 57 |
+
|
| 58 |
+
### Produktionshensyn
|
| 59 |
+
|
| 60 |
+
- **Sporbarhed.** Hvert genereret svar har kildehenvisninger på chunk-niveau med dokument-ID, sidenummer og tekststykke, så det kan revideres bagudrettet.
|
| 61 |
+
- **Governance.** RAGAS-evalueringspipelinen i `src/evaluation/` gør det muligt at måle faithfulness og context precision, før ændringer slippes løs i produktion.
|
| 62 |
+
- **Konfigurerbarhed.** Ingen hardkodede stier, modelnavne eller API-nøgler. Alt styres via miljøvariabler gennem `src/config.py`.
|
| 63 |
+
- **Provider-neutralitet.** Forretningskoden importerer aldrig en provider-SDK direkte. LLM- og embedding-backends skiftes via factory-funktionerne `create_llm()` og `create_embeddings()`, hvilket undgår vendor lock-in.
|
| 64 |
+
- **Lokal som standard.** Standardkonfigurationen kører helt uden eksterne API-kald og passer til miljøer med strenge krav til datahjemsted.
|
| 65 |
+
- **Pakket i containere.** Docker Compose til lokal kørsel og Hugging Face Spaces til den offentlige demo.
|
| 66 |
+
|
| 67 |
+
### Teknologivalg
|
| 68 |
+
|
| 69 |
+
| Kategori | Teknologi |
|
| 70 |
+
|---|---|
|
| 71 |
+
| Framework | FastAPI, uvicorn |
|
| 72 |
+
| Orkestrering | LangChain, LangGraph |
|
| 73 |
+
| Vektorlager | Qdrant (lokal tilstand) |
|
| 74 |
+
| Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
|
| 75 |
+
| LLM | `gemma4:e4b` via Ollama som standard |
|
| 76 |
+
| Sparse-søgning | rank_bm25 |
|
| 77 |
+
| Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
|
| 78 |
+
| PDF-parsing | PyMuPDF |
|
| 79 |
+
| Evaluering | RAGAS |
|
| 80 |
+
| Grænseflade | Streamlit |
|
| 81 |
+
|
| 82 |
+
### Provider-understøttelse
|
| 83 |
+
|
| 84 |
+
LLM- og embedding-backends konfigureres via miljøvariabler. De understøttede providers er Ollama, OpenAI, Azure OpenAI, Anthropic, Google GenAI og Groq. Standardopsætningen (Ollama og HuggingFace) kører helt lokalt uden API-nøgler.
|
| 85 |
+
|
| 86 |
+
Se `.env.example` for konfiguration pr. provider.
|
| 87 |
+
|
| 88 |
+
### Prøv den live
|
| 89 |
+
|
| 90 |
+
Demoen ligger på [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space).
|
| 91 |
+
|
| 92 |
+
Prøv for eksempel disse spørgsmål på dansk.
|
| 93 |
+
|
| 94 |
+
- "Hvad er KU's politik for brug af AI-værktøjer?"
|
| 95 |
+
- "Hvilke regler gælder for brug af generativ AI i eksamen?"
|
| 96 |
+
- "Sammenlign reglerne for AI-brug i forskning og undervisning."
|
| 97 |
+
|
| 98 |
+
Det sidste spørgsmål udløser Plan-and-Execute-agenten, så man kan se den nedbryde spørgsmålet i delopgaver i realtid.
|
| 99 |
+
|
| 100 |
+
### Kom i gang
|
| 101 |
+
|
| 102 |
+
Kræver Python 3.11+ og [Ollama](https://ollama.com/).
|
| 103 |
+
|
| 104 |
+
```bash
|
| 105 |
+
git clone https://github.com/Xiiqiing/Dokumentassistent.git
|
| 106 |
+
cd Dokumentassistent
|
| 107 |
+
python -m venv .venv && source .venv/bin/activate
|
| 108 |
+
pip install -r requirements.txt
|
| 109 |
+
cp .env.example .env
|
| 110 |
+
|
| 111 |
+
ollama pull gemma4:e4b
|
| 112 |
+
python -m scripts.ingest # læg PDF-filer i docs/ først
|
| 113 |
+
|
| 114 |
+
uvicorn src.api.main:app --reload # http://localhost:8000
|
| 115 |
+
streamlit run src/ui/app.py # http://localhost:8501
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
### Docker
|
| 119 |
+
|
| 120 |
+
Docker Compose håndterer Qdrant, API'et og Streamlit-grænsefladen samlet. API-containeren venter på, at Qdrant er oppe, og kører ingestion automatisk, hvis samlingen er tom.
|
| 121 |
+
|
| 122 |
+
#### Lokalt setup med Ollama og HuggingFace
|
| 123 |
+
|
| 124 |
+
```bash
|
| 125 |
+
cp .env.example .env
|
| 126 |
+
docker compose --profile local up --build
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
| Service | URL |
|
| 130 |
+
|---|---|
|
| 131 |
+
| API | http://localhost:8000 |
|
| 132 |
+
| API-dokumentation | http://localhost:8000/docs |
|
| 133 |
+
| Streamlit-grænseflade | http://localhost:8501 |
|
| 134 |
+
| Qdrant-dashboard | http://localhost:6333/dashboard |
|
| 135 |
|
| 136 |
+
#### Cloud-setup med OpenAI, Anthropic eller andre
|
| 137 |
|
| 138 |
+
```bash
|
| 139 |
+
cp .env.example .env
|
| 140 |
+
# sæt LLM_PROVIDER, EMBEDDING_PROVIDER og din API-nøgle
|
| 141 |
+
docker compose up --build
|
| 142 |
+
```
|
| 143 |
|
| 144 |
+
#### Hugging Face Spaces
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 145 |
|
| 146 |
+
Et `Dockerfile` og en supervisor-konfiguration er inkluderet. Spacet kører Qdrant, API'et og grænsefladen bag nginx på port 7860.
|
| 147 |
+
|
| 148 |
+
### Projektstruktur
|
| 149 |
+
|
| 150 |
+
```
|
| 151 |
+
src/
|
| 152 |
+
config.py # konfiguration via miljøvariabler
|
| 153 |
+
provider.py # create_llm() og create_embeddings() factory
|
| 154 |
+
models.py # delte dataklasser
|
| 155 |
+
ingestion/
|
| 156 |
+
pdf_parser.py # PyMuPDF-udtræk
|
| 157 |
+
text_cleaner.py # dansk og engelsk normalisering
|
| 158 |
+
chunker.py # fast størrelse, rekursiv og semantisk opdeling
|
| 159 |
+
pipeline.py # ingestion-orkestrering
|
| 160 |
+
retrieval/
|
| 161 |
+
embedder.py
|
| 162 |
+
vector_store.py # Qdrant
|
| 163 |
+
bm25_search.py
|
| 164 |
+
hybrid.py # reciprocal rank fusion
|
| 165 |
+
reranker.py # cross-encoder
|
| 166 |
+
api/
|
| 167 |
+
main.py
|
| 168 |
+
routes.py # /query, /ingest, /health
|
| 169 |
+
agent/
|
| 170 |
+
intent_classifier.py
|
| 171 |
+
router.py # pipeline-tilstand (AGENT_MODE=pipeline)
|
| 172 |
+
tools.py # seks retrieval-værktøjer og ToolResultStore
|
| 173 |
+
plan_and_execute.py # Plan-and-Execute-agent (AGENT_MODE=react)
|
| 174 |
+
memory.py # samtalehukommelse til flere spørgsmål
|
| 175 |
+
evaluation/
|
| 176 |
+
evaluator.py # RAGAS-metrikker
|
| 177 |
+
ui/
|
| 178 |
+
app.py # Streamlit-frontend
|
| 179 |
+
scripts/
|
| 180 |
+
ingest.py
|
| 181 |
+
e2e_test.py
|
| 182 |
+
tests/
|
| 183 |
+
docs/ # eksempel-PDF'er eller tekster (KU AI-dokumenter)
|
| 184 |
+
```
|
| 185 |
+
|
| 186 |
+
---
|
| 187 |
+
|
| 188 |
+
## English
|
| 189 |
+
|
| 190 |
+
A production-ready RAG application that lets users ask questions about documents in Danish and receive answers with source citations. The system is built on open source components (LangChain, LangGraph, Qdrant, Ollama) and can run fully local without any external API calls. It implements hybrid search with reranking, a Plan-and-Execute agent with conversation memory, and RAGAS-based evaluation of answer quality.
|
| 191 |
+
|
| 192 |
+
### Capabilities
|
| 193 |
+
|
| 194 |
+
| Area | Implementation |
|
| 195 |
+
|---|---|
|
| 196 |
+
| Unstructured data | PyMuPDF parser, Danish and English text cleaning, three chunking strategies (fixed-size, recursive, semantic) |
|
| 197 |
+
| Hybrid retrieval | Qdrant dense vectors combined with BM25, fused via reciprocal rank fusion |
|
| 198 |
+
| Reranking | Cross-encoder `mmarco-mMiniLMv2-L12-H384-v1` |
|
| 199 |
+
| Agent flows | Plan-and-Execute with six tools, ReAct sub-agent and conversation memory |
|
| 200 |
+
| Evaluation | RAGAS metrics (faithfulness, answer relevancy, context precision) |
|
| 201 |
+
| Traceability | Each answer carries source references with chunk ID and page number, plus structured logging |
|
| 202 |
+
| Provider abstraction | Factory pattern that allows swapping between Ollama, OpenAI, Azure OpenAI, Anthropic and Google GenAI without touching business code |
|
| 203 |
+
| Deployment | Docker Compose for local setup, Hugging Face Spaces for the public demo |
|
| 204 |
+
|
| 205 |
+
### How it works
|
| 206 |
+
|
| 207 |
+
PDFs are parsed with PyMuPDF, cleaned, split into chunks (fixed-size, recursive, or semantic), embedded with a multilingual sentence-transformer, and stored in Qdrant. A BM25 index is built from the same chunks for keyword search.
|
| 208 |
+
|
| 209 |
+
At query time, both indexes are searched and the results merged with reciprocal rank fusion. A cross-encoder then rescores the candidates before the top chunks are passed to the LLM. The API streams the response over SSE and the Streamlit UI displays it together with the sources.
|
| 210 |
+
|
| 211 |
+
### Two agent modes
|
| 212 |
+
|
| 213 |
+
The system can run in two different modes, switchable via the `AGENT_MODE` environment variable.
|
| 214 |
+
|
| 215 |
+
**Pipeline** (`AGENT_MODE=pipeline`) is built on a fixed LangGraph graph with language detection, optional translation, hybrid retrieval, reranking, and generation. The mode has a confidence-based retry loop and works well with lightweight local models.
|
| 216 |
+
|
| 217 |
+
**Plan-and-Execute Agent** (`AGENT_MODE=react`, default) is a multi-step agent where a planner first decomposes the query into sub-tasks, an executor runs each sub-task through a ReAct sub-agent with access to a set of tools, and a synthesizer produces the final answer with citations. The mode includes conversation memory for follow-up questions and requires a model that supports tool calling.
|
| 218 |
+
|
| 219 |
+
| Tool | Purpose |
|
| 220 |
+
|---|---|
|
| 221 |
+
| `hybrid_search(query, top_k)` | Retrieves relevant passages via hybrid search and reranking |
|
| 222 |
+
| `multi_query_search(question, top_k)` | Decomposes complex questions into sub-queries, searches each, and merges the results |
|
| 223 |
+
| `search_within_document(document_id, query, top_k)` | Finds specific sections inside a known document |
|
| 224 |
+
| `summarize_document(document_id)` | Generates a structured summary of a document |
|
| 225 |
+
| `list_documents()` | Shows what is in the knowledge base |
|
| 226 |
+
| `fetch_document(document_id)` | Reads a full document |
|
| 227 |
+
|
| 228 |
+
### Production considerations
|
| 229 |
+
|
| 230 |
+
- **Traceability.** Every generated answer carries chunk-level source references with document ID, page number and span, so it can be audited and reviewed afterwards.
|
| 231 |
+
- **Governance.** The RAGAS evaluation pipeline in `src/evaluation/` lets you measure faithfulness and context precision before promoting changes to production.
|
| 232 |
+
- **Configurability.** No hardcoded paths, model names or API keys. Everything is controlled via environment variables through `src/config.py`.
|
| 233 |
+
- **Provider neutrality.** Business code never imports a provider SDK directly. LLM and embedding backends swap via the `create_llm()` and `create_embeddings()` factory functions, which avoids vendor lock-in.
|
| 234 |
+
- **Local-first.** The default configuration runs entirely without external API calls and fits environments with strict data residency requirements.
|
| 235 |
+
- **Containerized.** Docker Compose for local runs and Hugging Face Spaces for the public demo.
|
| 236 |
+
|
| 237 |
+
### Tech stack
|
| 238 |
|
| 239 |
| Category | Technology |
|
| 240 |
|---|---|
|
| 241 |
| Framework | FastAPI, uvicorn |
|
| 242 |
| Orchestration | LangChain, LangGraph |
|
| 243 |
+
| Vector store | Qdrant (local mode) |
|
| 244 |
| Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
|
| 245 |
+
| LLM | `gemma4:e4b` via Ollama (default) |
|
| 246 |
+
| Sparse search | rank_bm25 |
|
| 247 |
| Reranking | `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
|
| 248 |
+
| PDF parsing | PyMuPDF |
|
| 249 |
| Evaluation | RAGAS |
|
| 250 |
| UI | Streamlit |
|
| 251 |
|
| 252 |
+
### Provider support
|
| 253 |
|
| 254 |
+
LLM and embedding backends are configured through environment variables. Supported providers are Ollama, OpenAI, Azure OpenAI, Anthropic, Google GenAI and Groq. The default setup (Ollama and HuggingFace) runs entirely locally without any API keys.
|
| 255 |
|
| 256 |
See `.env.example` for per-provider configuration.
|
| 257 |
|
| 258 |
+
### Try it live
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 259 |
|
| 260 |
+
The demo lives at [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space).
|
| 261 |
|
| 262 |
+
Try asking these questions in Danish.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 263 |
|
| 264 |
+
- "Hvad er KU's politik for brug af AI-værktøjer?"
|
| 265 |
+
- "Hvilke regler gælder for brug af generativ AI i eksamen?"
|
| 266 |
+
- "Sammenlign reglerne for AI-brug i forskning og undervisning."
|
| 267 |
|
| 268 |
+
The third question triggers the Plan-and-Execute agent, so you can watch it decompose the query into sub-tasks in real time.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 269 |
|
| 270 |
+
### Quick start
|
| 271 |
|
| 272 |
Requires Python 3.11+ and [Ollama](https://ollama.com/).
|
| 273 |
|
|
|
|
| 281 |
ollama pull gemma4:e4b
|
| 282 |
python -m scripts.ingest # place PDFs in docs/ first
|
| 283 |
|
| 284 |
+
uvicorn src.api.main:app --reload # http://localhost:8000
|
| 285 |
+
streamlit run src/ui/app.py # http://localhost:8501
|
| 286 |
```
|
| 287 |
|
| 288 |
+
### Docker
|
| 289 |
|
| 290 |
+
Docker Compose handles Qdrant, the API and the Streamlit UI together. The API container waits for Qdrant on startup and runs ingestion automatically if the collection is empty.
|
| 291 |
|
| 292 |
+
#### Local setup with Ollama and HuggingFace
|
| 293 |
|
| 294 |
```bash
|
| 295 |
cp .env.example .env
|
|
|
|
| 303 |
| Streamlit UI | http://localhost:8501 |
|
| 304 |
| Qdrant dashboard | http://localhost:6333/dashboard |
|
| 305 |
|
| 306 |
+
#### Cloud setup with OpenAI, Anthropic or others
|
| 307 |
|
| 308 |
```bash
|
| 309 |
cp .env.example .env
|
| 310 |
+
# set LLM_PROVIDER, EMBEDDING_PROVIDER and your API key
|
| 311 |
docker compose up --build
|
| 312 |
```
|
| 313 |
|
| 314 |
+
#### Hugging Face Spaces
|
| 315 |
+
|
| 316 |
+
A `Dockerfile` and supervisor configuration are included. The Space runs Qdrant, the API and the UI behind nginx on port 7860.
|
| 317 |
|
| 318 |
+
### Project structure
|
| 319 |
|
| 320 |
```
|
| 321 |
src/
|
| 322 |
config.py # env-based configuration
|
| 323 |
+
provider.py # create_llm() and create_embeddings() factory
|
| 324 |
models.py # shared dataclasses
|
| 325 |
ingestion/
|
| 326 |
pdf_parser.py # PyMuPDF extraction
|
| 327 |
+
text_cleaner.py # Danish and English normalization
|
| 328 |
chunker.py # fixed-size, recursive, semantic chunking
|
| 329 |
pipeline.py # ingestion orchestration
|
| 330 |
retrieval/
|
|
|
|
| 339 |
agent/
|
| 340 |
intent_classifier.py
|
| 341 |
router.py # pipeline mode (AGENT_MODE=pipeline)
|
| 342 |
+
tools.py # six retrieval tools and ToolResultStore
|
| 343 |
plan_and_execute.py # Plan-and-Execute agent (AGENT_MODE=react)
|
| 344 |
memory.py # conversation memory for multi-turn
|
| 345 |
evaluation/
|