XQ commited on
Commit
6fd2f67
·
1 Parent(s): 6ce81cf

Add agent flow

Browse files
README.md CHANGED
@@ -14,13 +14,27 @@ A RAG-based document assistant for Danish-language PDFs, featuring hybrid search
14
 
15
  ## Architecture
16
 
17
- The system follows a three-stage RAG pipeline:
18
 
19
  **Ingestion:** PDF documents are parsed with PyMuPDF, cleaned, and split into chunks using one of three strategies (fixed-size, recursive, or semantic). Each chunk is embedded via a multilingual sentence-transformer and stored in a Qdrant vector collection. A parallel BM25 index is built from the same chunks for sparse keyword matching.
20
 
21
- **Retrieval:** User queries run through both dense (Qdrant cosine similarity) and sparse (BM25) search paths. Results are merged via reciprocal rank fusion, then a cross-encoder reranker scores each candidate for final ordering. An intent classifier routes queries to the appropriate retrieval strategy.
22
 
23
- **Generation:** Top-ranked chunks are assembled into a prompt context and passed to the LLM. The routing pipeline is orchestrated as a stateful LangGraph graph — each step (language detection, translation, retrieval, reranking, generation) runs as a node with full intermediate state preserved. The response is returned via a FastAPI endpoint and displayed in a Streamlit UI. Retrieval quality can be measured offline using RAGAS metrics.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ## Tech Stack
26
 
@@ -37,6 +51,7 @@ The system follows a three-stage RAG pipeline:
37
  | Evaluation | RAGAS |
38
  | UI | Streamlit |
39
  | Config | python-dotenv |
 
40
 
41
  ## Provider Support
42
 
@@ -51,6 +66,45 @@ Both LLM and embedding backends are swappable through environment variables —
51
 
52
  Switch providers by editing `LLM_PROVIDER` and `EMBEDDING_PROVIDER` in your `.env` file. See `.env.example` for per-provider configuration details.
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ## Quick Start
55
 
56
  Prerequisites: Python 3.11+ and [Ollama](https://ollama.com/) installed.
@@ -146,7 +200,9 @@ src/
146
  routes.py # REST endpoints (query, ingest, health)
147
  agent/
148
  intent_classifier.py # Query intent detection
149
- router.py # Strategy routing based on intent
 
 
150
  evaluation/
151
  evaluator.py # RAGAS-based retrieval quality metrics
152
  ui/
 
14
 
15
  ## Architecture
16
 
17
+ The system follows a three-stage RAG pipeline with an optional Agent Flows mode:
18
 
19
  **Ingestion:** PDF documents are parsed with PyMuPDF, cleaned, and split into chunks using one of three strategies (fixed-size, recursive, or semantic). Each chunk is embedded via a multilingual sentence-transformer and stored in a Qdrant vector collection. A parallel BM25 index is built from the same chunks for sparse keyword matching.
20
 
21
+ **Retrieval:** User queries run through both dense (Qdrant cosine similarity) and sparse (BM25) search paths. Results are merged via reciprocal rank fusion, then a cross-encoder reranker scores each candidate for final ordering.
22
 
23
+ **Generation:** Top-ranked chunks are assembled into a prompt context and passed to the LLM. The response is returned via a FastAPI endpoint with full SSE streaming and displayed in a Streamlit UI. Retrieval quality can be measured offline using RAGAS metrics.
24
+
25
+ **Routing — two modes (switchable via `AGENT_MODE`):**
26
+
27
+ - **Pipeline mode** (default, `AGENT_MODE=pipeline`): Fixed LangGraph DAG — language detection → optional translation → hybrid retrieval → cross-encoder reranking → intent-specific generation. Robust on any LLM including local Ollama models.
28
+
29
+ - **ReAct Agent mode** (`AGENT_MODE=react`): Replaces the fixed DAG with a multi-step reasoning loop. The LLM decides which tools to call and how many times, then produces a grounded answer citing source documents. Supports multi-hop questions, comparisons across documents, and procedural queries that benefit from iterative retrieval. Requires an LLM with tool-calling support (OpenAI, Anthropic, Google GenAI, or compatible Ollama models such as `llama3.1` / `qwen2.5`).
30
+
31
+ Available tools in ReAct mode:
32
+
33
+ | Tool | When the LLM uses it |
34
+ |------|----------------------|
35
+ | `hybrid_search(query, top_k)` | Find relevant passages — called once or multiple times with refined queries |
36
+ | `list_documents()` | Discover which documents are in the knowledge base |
37
+ | `fetch_document(document_id)` | Read the full text of a named document (e.g. for summaries) |
38
 
39
  ## Tech Stack
40
 
 
51
  | Evaluation | RAGAS |
52
  | UI | Streamlit |
53
  | Config | python-dotenv |
54
+ | Agent Flows | LangGraph `create_react_agent` + LangChain `@tool` |
55
 
56
  ## Provider Support
57
 
 
66
 
67
  Switch providers by editing `LLM_PROVIDER` and `EMBEDDING_PROVIDER` in your `.env` file. See `.env.example` for per-provider configuration details.
68
 
69
+ ## Agent Mode
70
+
71
+ The system supports two routing modes, controlled by `AGENT_MODE` in `.env`:
72
+
73
+ | Mode | Value | Description |
74
+ |------|-------|-------------|
75
+ | Pipeline (default) | `AGENT_MODE=pipeline` | Fixed LangGraph DAG. Works with any LLM including local Ollama models such as `gemma3:4b`. |
76
+ | ReAct Agent | `AGENT_MODE=react` | Multi-step reasoning loop. The LLM calls tools as many times as needed — `hybrid_search` for targeted passages, `list_documents` to navigate the knowledge base, `fetch_document` for full document reads — then cites sources in the final answer. |
77
+
78
+ **LLM compatibility for ReAct mode:**
79
+
80
+ `AGENT_MODE=react` requires a model with native tool-calling support. Use `AGENT_MODE=pipeline` (the default) if your model does not support it.
81
+
82
+ | Provider | Tool-calling support |
83
+ |----------|---------------------|
84
+ | OpenAI (`gpt-4o-mini`, `gpt-4o`) | Yes |
85
+ | Anthropic (`claude-*`) | Yes |
86
+ | Google GenAI (`gemini-*`) | Yes |
87
+ | Azure OpenAI | Yes |
88
+ | Ollama — `llama3.1`, `qwen2.5`, `mistral-nemo` | Yes (model-dependent) |
89
+ | Ollama — `gemma3:4b` (default) | No → use `pipeline` mode |
90
+
91
+ Example `.env` for ReAct mode with OpenAI:
92
+
93
+ ```dotenv
94
+ AGENT_MODE=react
95
+ LLM_PROVIDER=openai
96
+ OPENAI_API_KEY=sk-...
97
+ OPENAI_MODEL=gpt-4o-mini
98
+ ```
99
+
100
+ Example `.env` for pipeline mode with local Ollama (default, no API key needed):
101
+
102
+ ```dotenv
103
+ AGENT_MODE=pipeline
104
+ LLM_PROVIDER=ollama
105
+ OLLAMA_MODEL=gemma3:4b
106
+ ```
107
+
108
  ## Quick Start
109
 
110
  Prerequisites: Python 3.11+ and [Ollama](https://ollama.com/) installed.
 
200
  routes.py # REST endpoints (query, ingest, health)
201
  agent/
202
  intent_classifier.py # Query intent detection
203
+ router.py # Fixed-DAG pipeline router (AGENT_MODE=pipeline)
204
+ tools.py # @tool-decorated hybrid_search + ToolResultStore
205
+ react_router.py # ReAct agent router with tool-calling loop (AGENT_MODE=react)
206
  evaluation/
207
  evaluator.py # RAGAS-based retrieval quality metrics
208
  ui/
src/agent/react_router.py ADDED
@@ -0,0 +1,253 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """ReAct agent router using a LangGraph tool-calling loop.
2
+
3
+ Replaces the fixed detect→translate→retrieve→rerank→generate DAG with a
4
+ multi-step reasoning loop where the LLM decides which tools to call and
5
+ when it has gathered enough information to produce a final answer.
6
+
7
+ Requires an LLM that supports bind_tools (OpenAI, Anthropic, Google GenAI,
8
+ and compatible Ollama models such as llama3.1 / qwen2.5). Set
9
+ AGENT_MODE=react in .env to activate; falls back to QueryRouter otherwise.
10
+ """
11
+
12
+ import logging
13
+ from collections.abc import Generator
14
+
15
+ from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage
16
+ from langchain_core.runnables import Runnable
17
+ from langgraph.prebuilt import create_react_agent
18
+
19
+ from src.models import GenerationResponse, IntentType, PipelineDetails, QueryResult
20
+ from src.agent.tools import ToolResultStore, make_retrieval_tools
21
+ from src.retrieval.hybrid import HybridRetriever
22
+ from src.retrieval.reranker import Reranker
23
+ from src.retrieval.vector_store import VectorStore
24
+
25
+ logger = logging.getLogger(__name__)
26
+
27
+ _SYSTEM_PROMPT = (
28
+ "You are a helpful assistant for administrative staff at the University of Copenhagen (KU).\n\n"
29
+ "You have access to a hybrid_search tool that searches KU policy documents stored in the "
30
+ "knowledge base.\n\n"
31
+ "Guidelines:\n"
32
+ "- Always call hybrid_search before answering questions about KU rules, policies, exams, "
33
+ "employment conditions, or administrative procedures.\n"
34
+ "- If the first search does not return sufficient information, call hybrid_search again "
35
+ "with a refined or more specific query.\n"
36
+ "- For comparison questions, search for each item separately.\n"
37
+ "- Cite the document sources ([1], [2], …) in your answer.\n"
38
+ "- Answer in the same language as the user's question."
39
+ )
40
+
41
+
42
+ def _ser_sources(sources: list[QueryResult]) -> list[dict]:
43
+ """Serialise QueryResult list to a JSON-safe list of dicts."""
44
+ return [
45
+ {
46
+ "chunk_id": r.chunk.chunk_id,
47
+ "document_id": r.chunk.document_id,
48
+ "text": r.chunk.text,
49
+ "score": r.score,
50
+ "source": r.source,
51
+ }
52
+ for r in sources
53
+ ]
54
+
55
+
56
+ class ReActRouter:
57
+ """Routes queries through a multi-step ReAct agent with tool-calling LLM.
58
+
59
+ The agent runs in a loop: the LLM reasons about the query, calls
60
+ hybrid_search as many times as needed, observes results, and finally
61
+ produces a grounded answer. Results from every tool call are merged into
62
+ a single ranked source list that is returned alongside the answer.
63
+ """
64
+
65
+ def __init__(
66
+ self,
67
+ llm: Runnable,
68
+ hybrid_retriever: HybridRetriever,
69
+ reranker: Reranker,
70
+ vector_store: VectorStore,
71
+ default_top_k: int = 5,
72
+ ) -> None:
73
+ """Initialise the ReAct router.
74
+
75
+ Args:
76
+ llm: LLM with tool-calling support (must implement bind_tools).
77
+ hybrid_retriever: HybridRetriever instance.
78
+ reranker: Reranker instance.
79
+ vector_store: VectorStore instance for document-level tool access.
80
+ default_top_k: Default number of results returned per tool call.
81
+ """
82
+ self._llm = llm
83
+ self._hybrid_retriever = hybrid_retriever
84
+ self._reranker = reranker
85
+ self._vector_store = vector_store
86
+ self._default_top_k = default_top_k
87
+
88
+ # ------------------------------------------------------------------
89
+ # Internal helpers
90
+ # ------------------------------------------------------------------
91
+
92
+ def _make_graph(self, store: ToolResultStore) -> object:
93
+ """Build a fresh ReAct graph bound to *store* for one request."""
94
+ tools = make_retrieval_tools(
95
+ self._hybrid_retriever,
96
+ self._reranker,
97
+ self._vector_store,
98
+ store,
99
+ self._default_top_k,
100
+ )
101
+ return create_react_agent(self._llm, tools)
102
+
103
+ @staticmethod
104
+ def _extract_answer(messages: list) -> str:
105
+ """Return the last non-tool-call AIMessage content as the final answer."""
106
+ for msg in reversed(messages):
107
+ if (
108
+ isinstance(msg, AIMessage)
109
+ and msg.content
110
+ and not getattr(msg, "tool_calls", None)
111
+ ):
112
+ return str(msg.content)
113
+ return ""
114
+
115
+ # ------------------------------------------------------------------
116
+ # Public interface (mirrors QueryRouter)
117
+ # ------------------------------------------------------------------
118
+
119
+ def route(self, query: str, top_k: int) -> GenerationResponse:
120
+ """Route a query through the ReAct agent pipeline.
121
+
122
+ Args:
123
+ query: The user's natural language query.
124
+ top_k: Number of top documents to retrieve per tool call.
125
+
126
+ Returns:
127
+ GenerationResponse with answer, sources, intent, and confidence.
128
+ """
129
+ logger.info("ReAct routing query: %s", query)
130
+ store = ToolResultStore()
131
+ graph = self._make_graph(store)
132
+
133
+ result = graph.invoke(
134
+ {
135
+ "messages": [
136
+ SystemMessage(content=_SYSTEM_PROMPT),
137
+ HumanMessage(content=query),
138
+ ]
139
+ }
140
+ )
141
+
142
+ messages = result.get("messages", [])
143
+ answer = self._extract_answer(messages)
144
+
145
+ sources = store.retrieved[:top_k]
146
+ confidence = max((r.score for r in sources), default=0.0)
147
+
148
+ logger.info(
149
+ "ReAct answer ready (confidence=%.4f, sources=%d, tool_calls=%d)",
150
+ confidence,
151
+ len(sources),
152
+ len(store.tool_calls),
153
+ )
154
+
155
+ return GenerationResponse(
156
+ answer=answer,
157
+ sources=sources,
158
+ intent=IntentType.RAG if sources else IntentType.FACTUAL,
159
+ confidence=confidence,
160
+ pipeline_details=PipelineDetails(
161
+ original_query=query,
162
+ retrieval_query=", ".join(q for _, q in store.tool_calls) or query,
163
+ reranked_results=sources,
164
+ ),
165
+ )
166
+
167
+ def route_stream(self, query: str, top_k: int) -> Generator[dict, None, None]:
168
+ """Stream ReAct agent events step by step.
169
+
170
+ Yields event dicts with the following step types (in addition to the
171
+ existing pipeline steps understood by the UI):
172
+
173
+ - ``tool_call`` — LLM decided to call a tool; carries ``tool`` and ``query``.
174
+ - ``tool_result`` — Tool returned; carries ``tool``, ``result_count``.
175
+ - ``generate`` — LLM is writing the final answer.
176
+ - ``done`` — Final event with the full result payload.
177
+
178
+ Args:
179
+ query: User query.
180
+ top_k: Number of results to retrieve per tool call.
181
+
182
+ Yields:
183
+ Step event dicts.
184
+ """
185
+ store = ToolResultStore()
186
+ graph = self._make_graph(store)
187
+
188
+ all_messages: list = []
189
+
190
+ for chunk in graph.stream(
191
+ {
192
+ "messages": [
193
+ SystemMessage(content=_SYSTEM_PROMPT),
194
+ HumanMessage(content=query),
195
+ ]
196
+ },
197
+ stream_mode="updates",
198
+ ):
199
+ for _node_name, update in chunk.items():
200
+ if update is None:
201
+ continue
202
+ node_messages = update.get("messages", [])
203
+ all_messages.extend(node_messages)
204
+
205
+ for msg in node_messages:
206
+ if isinstance(msg, AIMessage):
207
+ for tc in getattr(msg, "tool_calls", []):
208
+ yield {
209
+ "step": "tool_call",
210
+ "tool": tc.get("name", ""),
211
+ "query": tc.get("args", {}).get("query", ""),
212
+ }
213
+ if msg.content and not getattr(msg, "tool_calls", None):
214
+ yield {"step": "generate"}
215
+
216
+ elif isinstance(msg, ToolMessage):
217
+ yield {
218
+ "step": "tool_result",
219
+ "tool": getattr(msg, "name", ""),
220
+ "result_count": len(store.retrieved),
221
+ }
222
+
223
+ answer = self._extract_answer(all_messages)
224
+ sources = store.retrieved[:top_k]
225
+ confidence = max((r.score for r in sources), default=0.0)
226
+
227
+ yield {
228
+ "step": "done",
229
+ "result": {
230
+ "answer": answer,
231
+ "sources": _ser_sources(sources),
232
+ "intent": (IntentType.RAG if sources else IntentType.FACTUAL).value,
233
+ "confidence": confidence,
234
+ "pipeline_details": {
235
+ "original_query": query,
236
+ "retrieval_query": ", ".join(q for _, q in store.tool_calls) or query,
237
+ "detected_language": "unknown",
238
+ "translated": False,
239
+ "dense_results": [],
240
+ "sparse_results": [],
241
+ "fused_results": [],
242
+ "reranked_results": [
243
+ {
244
+ "document_id": r.chunk.document_id,
245
+ "chunk_id": r.chunk.chunk_id,
246
+ "score": r.score,
247
+ "source": r.source,
248
+ }
249
+ for r in sources
250
+ ],
251
+ },
252
+ },
253
+ }
src/agent/tools.py ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """LangChain tools for the ReAct agent."""
2
+
3
+ import logging
4
+ from dataclasses import dataclass, field
5
+
6
+ from langchain_core.tools import tool
7
+
8
+ from src.models import QueryResult
9
+ from src.retrieval.hybrid import HybridRetriever
10
+ from src.retrieval.reranker import Reranker
11
+ from src.retrieval.vector_store import VectorStore
12
+
13
+ logger = logging.getLogger(__name__)
14
+
15
+
16
+ @dataclass
17
+ class ToolResultStore:
18
+ """Captures structured retrieval results produced during tool invocations.
19
+
20
+ Attributes:
21
+ retrieved: Accumulated QueryResult list across all hybrid_search calls,
22
+ merged by chunk_id and sorted by descending score.
23
+ tool_calls: Log of (tool_name, query_or_arg) tuples in invocation order.
24
+ """
25
+
26
+ retrieved: list[QueryResult] = field(default_factory=list)
27
+ tool_calls: list[tuple[str, str]] = field(default_factory=list)
28
+
29
+
30
+ def make_retrieval_tools(
31
+ hybrid_retriever: HybridRetriever,
32
+ reranker: Reranker,
33
+ vector_store: VectorStore,
34
+ store: ToolResultStore,
35
+ default_top_k: int = 5,
36
+ ) -> list:
37
+ """Create retrieval tools bound to the given components and result store.
38
+
39
+ The returned tools write structured QueryResult objects into *store* on each
40
+ invocation so the calling router can surface them as sources without having
41
+ to re-parse the tool's text output.
42
+
43
+ Args:
44
+ hybrid_retriever: HybridRetriever instance.
45
+ reranker: Reranker instance.
46
+ vector_store: VectorStore instance for document-level access.
47
+ store: Shared ToolResultStore that captures structured results.
48
+ default_top_k: Default number of results to return per call.
49
+
50
+ Returns:
51
+ List of LangChain tool callables ready for bind_tools / ToolNode.
52
+ """
53
+
54
+ @tool
55
+ def hybrid_search(query: str, top_k: int = default_top_k) -> str:
56
+ """Search the KU document knowledge base using hybrid retrieval.
57
+
58
+ Combines dense semantic search (Qdrant) and sparse keyword search (BM25),
59
+ then re-ranks results with a cross-encoder. Use this tool to find relevant
60
+ passages from ingested KU policy documents about rules, regulations, exam
61
+ procedures, employment conditions, and administrative guidelines.
62
+
63
+ Call this tool before answering any question that requires factual
64
+ information from KU documents. You may call it multiple times with
65
+ different queries if the first result is insufficient.
66
+
67
+ Args:
68
+ query: Search query. Danish gives the best recall against KU documents.
69
+ top_k: Number of top results to return (1–20). Default is 5.
70
+
71
+ Returns:
72
+ Formatted string of ranked document passages with source references
73
+ and relevance scores.
74
+ """
75
+ logger.info("Tool hybrid_search: query=%r top_k=%d", query, top_k)
76
+ store.tool_calls.append(("hybrid_search", query))
77
+
78
+ hybrid_result = hybrid_retriever.search_detailed(query, top_k=top_k)
79
+ results = reranker.rerank(query, hybrid_result.fused_results, top_k=top_k)
80
+
81
+ # Accumulate results across multiple calls (union by chunk_id, keep highest score)
82
+ existing = {r.chunk.chunk_id: r for r in store.retrieved}
83
+ for r in results:
84
+ cid = r.chunk.chunk_id
85
+ if cid not in existing or r.score > existing[cid].score:
86
+ existing[cid] = r
87
+ store.retrieved = sorted(existing.values(), key=lambda r: r.score, reverse=True)
88
+
89
+ if not results:
90
+ return "Ingen relevante dokumenter fundet. (No relevant documents found.)"
91
+
92
+ parts: list[str] = []
93
+ for i, r in enumerate(results, 1):
94
+ parts.append(
95
+ f"[{i}] {r.chunk.document_id} (relevance: {r.score:.3f})\n{r.chunk.text}"
96
+ )
97
+ return "\n\n---\n\n".join(parts)
98
+
99
+ @tool
100
+ def list_documents() -> str:
101
+ """List all documents currently available in the KU knowledge base.
102
+
103
+ Use this tool when the user asks which documents are available, wants to
104
+ know what topics are covered, or before fetching a specific document by ID.
105
+
106
+ Returns:
107
+ Newline-separated list of document IDs, or a message if the
108
+ knowledge base is empty.
109
+ """
110
+ logger.info("Tool list_documents called")
111
+ store.tool_calls.append(("list_documents", ""))
112
+
113
+ ids = vector_store.list_document_ids()
114
+ if not ids:
115
+ return "Ingen dokumenter i videnbasen. (Knowledge base is empty.)"
116
+ lines = "\n".join(f"- {doc_id}" for doc_id in ids)
117
+ return f"Dokumenter i videnbasen ({len(ids)} i alt):\n{lines}"
118
+
119
+ @tool
120
+ def fetch_document(document_id: str) -> str:
121
+ """Fetch the full text of a specific document from the knowledge base.
122
+
123
+ Use this tool when the user asks for a summary or overview of a named
124
+ document, or when hybrid_search results reference a document that
125
+ warrants deeper reading. Prefer hybrid_search for targeted questions.
126
+
127
+ Args:
128
+ document_id: The exact document ID as returned by list_documents or
129
+ seen in hybrid_search results (e.g. 'ku_ai_policy.pdf').
130
+
131
+ Returns:
132
+ The concatenated text of all chunks belonging to the document, or
133
+ an error message if the document ID is not found.
134
+ """
135
+ logger.info("Tool fetch_document: document_id=%r", document_id)
136
+ store.tool_calls.append(("fetch_document", document_id))
137
+
138
+ chunks = vector_store.get_chunks_by_document_id(document_id)
139
+ if not chunks:
140
+ return (
141
+ f"Dokumentet '{document_id}' blev ikke fundet i videnbasen. "
142
+ f"(Document not found. Use list_documents to see available IDs.)"
143
+ )
144
+
145
+ # Sort chunks by chunk_id to preserve document order
146
+ chunks.sort(key=lambda c: c.chunk_id)
147
+ full_text = "\n\n".join(c.text for c in chunks)
148
+ return (
149
+ f"Dokument: {document_id} ({len(chunks)} afsnit)\n\n"
150
+ f"{full_text}"
151
+ )
152
+
153
+ return [hybrid_search, list_documents, fetch_document]
src/api/main.py CHANGED
@@ -16,6 +16,7 @@ from src.retrieval.hybrid import HybridRetriever
16
  from src.retrieval.reranker import Reranker
17
  from src.agent.intent_classifier import IntentClassifier
18
  from src.agent.router import QueryRouter
 
19
  from src.ingestion.pipeline import IngestionPipeline
20
  from src.api.routes import router, set_dependencies
21
 
@@ -69,15 +70,27 @@ def create_app() -> FastAPI:
69
  bm25_weight=settings.bm25_weight,
70
  )
71
  reranker = Reranker(model=create_reranker(settings.reranker_model))
72
- intent_classifier = IntentClassifier(llm=llm, model_name=settings.generation_model)
73
- generator = llm | StrOutputParser()
74
- query_router = QueryRouter(
75
- intent_classifier=intent_classifier,
76
- hybrid_retriever=hybrid_retriever,
77
- reranker=reranker,
78
- generator=generator,
79
- translate_query=settings.translate_query,
80
- )
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
  set_dependencies(
83
  query_router=query_router,
 
16
  from src.retrieval.reranker import Reranker
17
  from src.agent.intent_classifier import IntentClassifier
18
  from src.agent.router import QueryRouter
19
+ from src.agent.react_router import ReActRouter
20
  from src.ingestion.pipeline import IngestionPipeline
21
  from src.api.routes import router, set_dependencies
22
 
 
70
  bm25_weight=settings.bm25_weight,
71
  )
72
  reranker = Reranker(model=create_reranker(settings.reranker_model))
73
+
74
+ if settings.agent_mode == "react":
75
+ logger.info("Agent mode: ReAct (tool-calling loop)")
76
+ query_router: QueryRouter | ReActRouter = ReActRouter(
77
+ llm=llm,
78
+ hybrid_retriever=hybrid_retriever,
79
+ reranker=reranker,
80
+ vector_store=vector_store,
81
+ default_top_k=settings.top_k,
82
+ )
83
+ else:
84
+ logger.info("Agent mode: pipeline (fixed DAG)")
85
+ intent_classifier = IntentClassifier(llm=llm, model_name=settings.generation_model)
86
+ generator = llm | StrOutputParser()
87
+ query_router = QueryRouter(
88
+ intent_classifier=intent_classifier,
89
+ hybrid_retriever=hybrid_retriever,
90
+ reranker=reranker,
91
+ generator=generator,
92
+ translate_query=settings.translate_query,
93
+ )
94
 
95
  set_dependencies(
96
  query_router=query_router,
src/api/routes.py CHANGED
@@ -14,6 +14,7 @@ from pydantic import BaseModel
14
 
15
  if TYPE_CHECKING:
16
  from src.agent.router import QueryRouter
 
17
  from src.config import Settings
18
  from src.ingestion.pipeline import IngestionPipeline
19
  from src.retrieval.bm25_search import BM25Search
@@ -24,7 +25,7 @@ logger = logging.getLogger(__name__)
24
 
25
  router = APIRouter()
26
 
27
- _query_router: "QueryRouter | None" = None
28
  _ingestion_pipeline: "IngestionPipeline | None" = None
29
  _embedder: "Embedder | None" = None
30
  _vector_store: "VectorStore | None" = None
@@ -33,7 +34,7 @@ _settings: "Settings | None" = None
33
 
34
 
35
  def set_dependencies(
36
- query_router: "QueryRouter",
37
  ingestion_pipeline: "IngestionPipeline",
38
  embedder: "Embedder",
39
  vector_store: "VectorStore",
 
14
 
15
  if TYPE_CHECKING:
16
  from src.agent.router import QueryRouter
17
+ from src.agent.react_router import ReActRouter
18
  from src.config import Settings
19
  from src.ingestion.pipeline import IngestionPipeline
20
  from src.retrieval.bm25_search import BM25Search
 
25
 
26
  router = APIRouter()
27
 
28
+ _query_router: "QueryRouter | ReActRouter | None" = None
29
  _ingestion_pipeline: "IngestionPipeline | None" = None
30
  _embedder: "Embedder | None" = None
31
  _vector_store: "VectorStore | None" = None
 
34
 
35
 
36
  def set_dependencies(
37
+ query_router: "QueryRouter | ReActRouter",
38
  ingestion_pipeline: "IngestionPipeline",
39
  embedder: "Embedder",
40
  vector_store: "VectorStore",
src/config.py CHANGED
@@ -64,6 +64,9 @@ class Settings:
64
  # Query translation
65
  translate_query: bool
66
 
 
 
 
67
 
68
  def _parse_bool(value: str, *, default: bool) -> bool:
69
  """Parse a boolean environment variable string.
@@ -141,4 +144,8 @@ def load_settings() -> Settings:
141
  os.environ.get("TRANSLATE_QUERY", ""),
142
  default=os.environ.get("LLM_PROVIDER", "ollama") == "ollama",
143
  ),
 
 
 
 
144
  )
 
64
  # Query translation
65
  translate_query: bool
66
 
67
+ # Agent mode: "pipeline" (fixed DAG) or "react" (tool-calling ReAct loop)
68
+ agent_mode: str
69
+
70
 
71
  def _parse_bool(value: str, *, default: bool) -> bool:
72
  """Parse a boolean environment variable string.
 
144
  os.environ.get("TRANSLATE_QUERY", ""),
145
  default=os.environ.get("LLM_PROVIDER", "ollama") == "ollama",
146
  ),
147
+
148
+ # Agent mode: "pipeline" keeps the existing fixed DAG; "react" enables
149
+ # the multi-step ReAct loop (requires an LLM with tool-calling support).
150
+ agent_mode=os.environ.get("AGENT_MODE", "pipeline"),
151
  )
src/retrieval/vector_store.py CHANGED
@@ -9,7 +9,7 @@ from langchain_core.documents import Document
9
  from langchain_core.retrievers import BaseRetriever
10
  from pydantic import ConfigDict
11
  from qdrant_client import QdrantClient
12
- from qdrant_client.models import Distance, PointStruct, VectorParams
13
 
14
  from src.models import ChunkStrategy, DocumentChunk, QueryResult
15
 
@@ -144,6 +144,55 @@ class VectorStore:
144
  logger.info("Loaded %d chunks from collection '%s'", len(chunks), self._collection_name)
145
  return chunks
146
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
  def as_retriever(self, embedder: Any, top_k: int) -> BaseRetriever:
148
  """Return a LangChain BaseRetriever wrapping this vector store.
149
 
 
9
  from langchain_core.retrievers import BaseRetriever
10
  from pydantic import ConfigDict
11
  from qdrant_client import QdrantClient
12
+ from qdrant_client.models import Distance, FieldCondition, Filter, MatchValue, PointStruct, VectorParams
13
 
14
  from src.models import ChunkStrategy, DocumentChunk, QueryResult
15
 
 
144
  logger.info("Loaded %d chunks from collection '%s'", len(chunks), self._collection_name)
145
  return chunks
146
 
147
+ def list_document_ids(self) -> list[str]:
148
+ """Return a sorted list of unique document IDs in the collection.
149
+
150
+ Returns:
151
+ Sorted list of document ID strings.
152
+ """
153
+ all_chunks = self.get_all_chunks()
154
+ ids = sorted({chunk.document_id for chunk in all_chunks})
155
+ logger.debug("Found %d unique document IDs", len(ids))
156
+ return ids
157
+
158
+ def get_chunks_by_document_id(self, document_id: str) -> list[DocumentChunk]:
159
+ """Retrieve all chunks belonging to a specific document.
160
+
161
+ Uses a Qdrant payload filter to avoid loading the full collection.
162
+
163
+ Args:
164
+ document_id: The document identifier to filter by.
165
+
166
+ Returns:
167
+ List of DocumentChunk objects for that document, in storage order.
168
+ """
169
+ records, _offset = self._client.scroll(
170
+ collection_name=self._collection_name,
171
+ scroll_filter=Filter(
172
+ must=[FieldCondition(key="document_id", match=MatchValue(value=document_id))]
173
+ ),
174
+ limit=10_000,
175
+ with_payload=True,
176
+ with_vectors=False,
177
+ )
178
+
179
+ chunks: list[DocumentChunk] = []
180
+ for record in records:
181
+ payload = record.payload
182
+ chunks.append(
183
+ DocumentChunk(
184
+ chunk_id=payload["chunk_id"],
185
+ document_id=payload["document_id"],
186
+ text=payload["text"],
187
+ metadata=json.loads(payload["metadata"]),
188
+ strategy=ChunkStrategy(payload["strategy"]),
189
+ )
190
+ )
191
+ logger.debug(
192
+ "Fetched %d chunks for document '%s'", len(chunks), document_id
193
+ )
194
+ return chunks
195
+
196
  def as_retriever(self, embedder: Any, top_k: int) -> BaseRetriever:
197
  """Return a LangChain BaseRetriever wrapping this vector store.
198
 
src/ui/app.py CHANGED
@@ -54,8 +54,10 @@ TEXTS: dict[str, dict[str, str]] = {
54
  "- **LLM-integration** — provider-agnostisk, prompt-styret "
55
  "svargenerering\n"
56
  "- **Evaluering** — RAGAS-baseret kvalitetsmaaling\n"
57
- "- **Agent-routing** — intent-klassifikation og "
58
- "forespørgselsdirigering"
 
 
59
  ),
60
  "chunking_label": "Chunking-strategi",
61
  "chunking_help": "Vaelg hvordan dokumenterne opdeles i tekststykker.",
@@ -99,7 +101,7 @@ TEXTS: dict[str, dict[str, str]] = {
99
  "pipeline_original": "Original foresporgsel",
100
  "pipeline_translated": "Oversat til dansk",
101
  "pipeline_lang": "Sprog registreret",
102
- "pipeline_no_translation": "Ingen oversaettelse (foresporgsel allerede paa dansk)",
103
  "pipeline_bm25": "BM25-resultater (leksikalsk soegning)",
104
  "pipeline_dense": "Vektorsoegning (semantisk)",
105
  "pipeline_fused": "RRF-fusioneret raekkefoelge",
@@ -129,8 +131,10 @@ TEXTS: dict[str, dict[str, str]] = {
129
  "- **LLM integration** — provider-agnostic, prompt-driven "
130
  "answer generation\n"
131
  "- **Evaluation** — RAGAS-based quality measurement\n"
132
- "- **Agent routing** — intent classification and query "
133
- "dispatch"
 
 
134
  ),
135
  "chunking_label": "Chunking strategy",
136
  "chunking_help": "Choose how documents are split into text chunks.",
@@ -174,7 +178,7 @@ TEXTS: dict[str, dict[str, str]] = {
174
  "pipeline_original": "Original query",
175
  "pipeline_translated": "Translated to Danish",
176
  "pipeline_lang": "Detected language",
177
- "pipeline_no_translation": "No translation (query already in Danish)",
178
  "pipeline_bm25": "BM25 Results (lexical search)",
179
  "pipeline_dense": "Vector Search (semantic)",
180
  "pipeline_fused": "RRF Fused Ranking",
@@ -487,9 +491,9 @@ if search_clicked and question.strip():
487
  )
488
  else:
489
  st.write(
490
- "Forespørgsel allerede dansk"
491
  if lang == "da"
492
- else "Query already in Danish"
493
  )
494
 
495
  elif _step == "retrieve":
@@ -510,6 +514,23 @@ if search_clicked and question.strip():
510
  else (f"Reranked to **{_rc}** results · confidence **{_cf:.0%}**")
511
  )
512
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
513
  elif _step == "generate":
514
  st.write(
515
  "Svar genereret"
 
54
  "- **LLM-integration** — provider-agnostisk, prompt-styret "
55
  "svargenerering\n"
56
  "- **Evaluering** — RAGAS-baseret kvalitetsmaaling\n"
57
+ "- **Agent Flows** — valgfri ReAct-loop med vaerktoejskald: "
58
+ "LLM bestemmer selv hvor mange soegninger der behoeves og "
59
+ "stoetter flertrinssraesonnering paa tvaers af dokumenter "
60
+ "(`AGENT_MODE=react`)"
61
  ),
62
  "chunking_label": "Chunking-strategi",
63
  "chunking_help": "Vaelg hvordan dokumenterne opdeles i tekststykker.",
 
101
  "pipeline_original": "Original foresporgsel",
102
  "pipeline_translated": "Oversat til dansk",
103
  "pipeline_lang": "Sprog registreret",
104
+ "pipeline_no_translation": "Ingen oversaettelse nødvendig",
105
  "pipeline_bm25": "BM25-resultater (leksikalsk soegning)",
106
  "pipeline_dense": "Vektorsoegning (semantisk)",
107
  "pipeline_fused": "RRF-fusioneret raekkefoelge",
 
131
  "- **LLM integration** — provider-agnostic, prompt-driven "
132
  "answer generation\n"
133
  "- **Evaluation** — RAGAS-based quality measurement\n"
134
+ "- **Agent Flows** — optional ReAct loop with tool calling: "
135
+ "the LLM decides how many searches are needed and supports "
136
+ "multi-step reasoning across documents "
137
+ "(`AGENT_MODE=react`)"
138
  ),
139
  "chunking_label": "Chunking strategy",
140
  "chunking_help": "Choose how documents are split into text chunks.",
 
178
  "pipeline_original": "Original query",
179
  "pipeline_translated": "Translated to Danish",
180
  "pipeline_lang": "Detected language",
181
+ "pipeline_no_translation": "No need for translation",
182
  "pipeline_bm25": "BM25 Results (lexical search)",
183
  "pipeline_dense": "Vector Search (semantic)",
184
  "pipeline_fused": "RRF Fused Ranking",
 
491
  )
492
  else:
493
  st.write(
494
+ "Ingen oversættelse nødvendig for forespørgslen"
495
  if lang == "da"
496
+ else "No translation needed for the query"
497
  )
498
 
499
  elif _step == "retrieve":
 
514
  else (f"Reranked to **{_rc}** results · confidence **{_cf:.0%}**")
515
  )
516
 
517
+ elif _step == "tool_call":
518
+ _tool_name = _event.get("tool", "")
519
+ _tool_query = _event.get("query", "")
520
+ st.write(
521
+ (f"Vaerktoej **{_tool_name}** kaldt: _{_tool_query}_")
522
+ if lang == "da"
523
+ else (f"Tool **{_tool_name}** called: _{_tool_query}_")
524
+ )
525
+
526
+ elif _step == "tool_result":
527
+ _rc = _event.get("result_count", 0)
528
+ st.write(
529
+ (f"Hentet **{_rc}** dokumenter")
530
+ if lang == "da"
531
+ else (f"Retrieved **{_rc}** documents")
532
+ )
533
+
534
  elif _step == "generate":
535
  st.write(
536
  "Svar genereret"