Spaces:

XQ
/

Dokumentassistent

Running

App Files Files

XQ commited on Apr 5

Commit

6fd2f67

1 Parent(s): 6ce81cf

Add agent flow

Browse files

Files changed (8) hide show

README.md +60 -4
src/agent/react_router.py +253 -0
src/agent/tools.py +153 -0
src/api/main.py +22 -9
src/api/routes.py +3 -2
src/config.py +7 -0
src/retrieval/vector_store.py +50 -1
src/ui/app.py +29 -8

README.md CHANGED Viewed

@@ -14,13 +14,27 @@ A RAG-based document assistant for Danish-language PDFs, featuring hybrid search
 ## Architecture
-The system follows a three-stage RAG pipeline:
 **Ingestion:** PDF documents are parsed with PyMuPDF, cleaned, and split into chunks using one of three strategies (fixed-size, recursive, or semantic). Each chunk is embedded via a multilingual sentence-transformer and stored in a Qdrant vector collection. A parallel BM25 index is built from the same chunks for sparse keyword matching.
-**Retrieval:** User queries run through both dense (Qdrant cosine similarity) and sparse (BM25) search paths. Results are merged via reciprocal rank fusion, then a cross-encoder reranker scores each candidate for final ordering. An intent classifier routes queries to the appropriate retrieval strategy.
-**Generation:** Top-ranked chunks are assembled into a prompt context and passed to the LLM. The routing pipeline is orchestrated as a stateful LangGraph graph — each step (language detection, translation, retrieval, reranking, generation) runs as a node with full intermediate state preserved. The response is returned via a FastAPI endpoint and displayed in a Streamlit UI. Retrieval quality can be measured offline using RAGAS metrics.
 ## Tech Stack
@@ -37,6 +51,7 @@ The system follows a three-stage RAG pipeline:
 | Evaluation | RAGAS |
 | UI | Streamlit |
 | Config | python-dotenv |
 ## Provider Support
@@ -51,6 +66,45 @@ Both LLM and embedding backends are swappable through environment variables —
 Switch providers by editing `LLM_PROVIDER` and `EMBEDDING_PROVIDER` in your `.env` file. See `.env.example` for per-provider configuration details.
 ## Quick Start
 Prerequisites: Python 3.11+ and [Ollama](https://ollama.com/) installed.
@@ -146,7 +200,9 @@ src/
     routes.py              # REST endpoints (query, ingest, health)
   agent/
     intent_classifier.py   # Query intent detection
-    router.py              # Strategy routing based on intent
   evaluation/
     evaluator.py           # RAGAS-based retrieval quality metrics
   ui/

 ## Architecture
+The system follows a three-stage RAG pipeline with an optional Agent Flows mode:
 **Ingestion:** PDF documents are parsed with PyMuPDF, cleaned, and split into chunks using one of three strategies (fixed-size, recursive, or semantic). Each chunk is embedded via a multilingual sentence-transformer and stored in a Qdrant vector collection. A parallel BM25 index is built from the same chunks for sparse keyword matching.
+**Retrieval:** User queries run through both dense (Qdrant cosine similarity) and sparse (BM25) search paths. Results are merged via reciprocal rank fusion, then a cross-encoder reranker scores each candidate for final ordering.
+**Generation:** Top-ranked chunks are assembled into a prompt context and passed to the LLM. The response is returned via a FastAPI endpoint with full SSE streaming and displayed in a Streamlit UI. Retrieval quality can be measured offline using RAGAS metrics.
+**Routing — two modes (switchable via `AGENT_MODE`):**
+- **Pipeline mode** (default, `AGENT_MODE=pipeline`): Fixed LangGraph DAG — language detection → optional translation → hybrid retrieval → cross-encoder reranking → intent-specific generation. Robust on any LLM including local Ollama models.
+- **ReAct Agent mode** (`AGENT_MODE=react`): Replaces the fixed DAG with a multi-step reasoning loop. The LLM decides which tools to call and how many times, then produces a grounded answer citing source documents. Supports multi-hop questions, comparisons across documents, and procedural queries that benefit from iterative retrieval. Requires an LLM with tool-calling support (OpenAI, Anthropic, Google GenAI, or compatible Ollama models such as `llama3.1` / `qwen2.5`).
+  Available tools in ReAct mode:
+  | Tool | When the LLM uses it |
+  |------|----------------------|
+  | `hybrid_search(query, top_k)` | Find relevant passages — called once or multiple times with refined queries |
+  | `list_documents()` | Discover which documents are in the knowledge base |
+  | `fetch_document(document_id)` | Read the full text of a named document (e.g. for summaries) |
 ## Tech Stack
 | Evaluation | RAGAS |
 | UI | Streamlit |
 | Config | python-dotenv |
+| Agent Flows | LangGraph `create_react_agent` + LangChain `@tool` |
 ## Provider Support
 Switch providers by editing `LLM_PROVIDER` and `EMBEDDING_PROVIDER` in your `.env` file. See `.env.example` for per-provider configuration details.
+## Agent Mode
+The system supports two routing modes, controlled by `AGENT_MODE` in `.env`:
+| Mode | Value | Description |
+|------|-------|-------------|
+| Pipeline (default) | `AGENT_MODE=pipeline` | Fixed LangGraph DAG. Works with any LLM including local Ollama models such as `gemma3:4b`. |
+| ReAct Agent | `AGENT_MODE=react` | Multi-step reasoning loop. The LLM calls tools as many times as needed — `hybrid_search` for targeted passages, `list_documents` to navigate the knowledge base, `fetch_document` for full document reads — then cites sources in the final answer. |
+**LLM compatibility for ReAct mode:**
+`AGENT_MODE=react` requires a model with native tool-calling support. Use `AGENT_MODE=pipeline` (the default) if your model does not support it.
+| Provider | Tool-calling support |
+|----------|---------------------|
+| OpenAI (`gpt-4o-mini`, `gpt-4o`) | Yes |
+| Anthropic (`claude-*`) | Yes |
+| Google GenAI (`gemini-*`) | Yes |
+| Azure OpenAI | Yes |
+| Ollama — `llama3.1`, `qwen2.5`, `mistral-nemo` | Yes (model-dependent) |
+| Ollama — `gemma3:4b` (default) | No → use `pipeline` mode |
+Example `.env` for ReAct mode with OpenAI:
+```dotenv
+AGENT_MODE=react
+LLM_PROVIDER=openai
+OPENAI_API_KEY=sk-...
+OPENAI_MODEL=gpt-4o-mini
+```
+Example `.env` for pipeline mode with local Ollama (default, no API key needed):
+```dotenv
+AGENT_MODE=pipeline
+LLM_PROVIDER=ollama
+OLLAMA_MODEL=gemma3:4b
+```
 ## Quick Start
 Prerequisites: Python 3.11+ and [Ollama](https://ollama.com/) installed.
     routes.py              # REST endpoints (query, ingest, health)
   agent/
     intent_classifier.py   # Query intent detection
+    router.py              # Fixed-DAG pipeline router (AGENT_MODE=pipeline)
+    tools.py               # @tool-decorated hybrid_search + ToolResultStore
+    react_router.py        # ReAct agent router with tool-calling loop (AGENT_MODE=react)
   evaluation/
     evaluator.py           # RAGAS-based retrieval quality metrics
   ui/

src/agent/react_router.py ADDED Viewed

	@@ -0,0 +1,253 @@

+"""ReAct agent router using a LangGraph tool-calling loop.
+Replaces the fixed detect→translate→retrieve→rerank→generate DAG with a
+multi-step reasoning loop where the LLM decides which tools to call and
+when it has gathered enough information to produce a final answer.
+Requires an LLM that supports bind_tools (OpenAI, Anthropic, Google GenAI,
+and compatible Ollama models such as llama3.1 / qwen2.5). Set
+AGENT_MODE=react in .env to activate; falls back to QueryRouter otherwise.
+"""
+import logging
+from collections.abc import Generator
+from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage
+from langchain_core.runnables import Runnable
+from langgraph.prebuilt import create_react_agent
+from src.models import GenerationResponse, IntentType, PipelineDetails, QueryResult
+from src.agent.tools import ToolResultStore, make_retrieval_tools
+from src.retrieval.hybrid import HybridRetriever
+from src.retrieval.reranker import Reranker
+from src.retrieval.vector_store import VectorStore
+logger = logging.getLogger(__name__)
+_SYSTEM_PROMPT = (
+    "You are a helpful assistant for administrative staff at the University of Copenhagen (KU).\n\n"
+    "You have access to a hybrid_search tool that searches KU policy documents stored in the "
+    "knowledge base.\n\n"
+    "Guidelines:\n"
+    "- Always call hybrid_search before answering questions about KU rules, policies, exams, "
+    "employment conditions, or administrative procedures.\n"
+    "- If the first search does not return sufficient information, call hybrid_search again "
+    "with a refined or more specific query.\n"
+    "- For comparison questions, search for each item separately.\n"
+    "- Cite the document sources ([1], [2], …) in your answer.\n"
+    "- Answer in the same language as the user's question."
+)
+def _ser_sources(sources: list[QueryResult]) -> list[dict]:
+    """Serialise QueryResult list to a JSON-safe list of dicts."""
+    return [
+        {
+            "chunk_id": r.chunk.chunk_id,
+            "document_id": r.chunk.document_id,
+            "text": r.chunk.text,
+            "score": r.score,
+            "source": r.source,
+        }
+        for r in sources
+    ]
+class ReActRouter:
+    """Routes queries through a multi-step ReAct agent with tool-calling LLM.
+    The agent runs in a loop: the LLM reasons about the query, calls
+    hybrid_search as many times as needed, observes results, and finally
+    produces a grounded answer.  Results from every tool call are merged into
+    a single ranked source list that is returned alongside the answer.
+    """
+    def __init__(
+        self,
+        llm: Runnable,
+        hybrid_retriever: HybridRetriever,
+        reranker: Reranker,
+        vector_store: VectorStore,
+        default_top_k: int = 5,
+    ) -> None:
+        """Initialise the ReAct router.
+        Args:
+            llm: LLM with tool-calling support (must implement bind_tools).
+            hybrid_retriever: HybridRetriever instance.
+            reranker: Reranker instance.
+            vector_store: VectorStore instance for document-level tool access.
+            default_top_k: Default number of results returned per tool call.
+        """
+        self._llm = llm
+        self._hybrid_retriever = hybrid_retriever
+        self._reranker = reranker
+        self._vector_store = vector_store
+        self._default_top_k = default_top_k
+    # ------------------------------------------------------------------
+    # Internal helpers
+    # ------------------------------------------------------------------
+    def _make_graph(self, store: ToolResultStore) -> object:
+        """Build a fresh ReAct graph bound to *store* for one request."""
+        tools = make_retrieval_tools(
+            self._hybrid_retriever,
+            self._reranker,
+            self._vector_store,
+            store,
+            self._default_top_k,
+        )
+        return create_react_agent(self._llm, tools)
+    @staticmethod
+    def _extract_answer(messages: list) -> str:
+        """Return the last non-tool-call AIMessage content as the final answer."""
+        for msg in reversed(messages):
+            if (
+                isinstance(msg, AIMessage)
+                and msg.content
+                and not getattr(msg, "tool_calls", None)
+            ):
+                return str(msg.content)
+        return ""
+    # ------------------------------------------------------------------
+    # Public interface (mirrors QueryRouter)
+    # ------------------------------------------------------------------
+    def route(self, query: str, top_k: int) -> GenerationResponse:
+        """Route a query through the ReAct agent pipeline.
+        Args:
+            query: The user's natural language query.
+            top_k: Number of top documents to retrieve per tool call.
+        Returns:
+            GenerationResponse with answer, sources, intent, and confidence.
+        """
+        logger.info("ReAct routing query: %s", query)
+        store = ToolResultStore()
+        graph = self._make_graph(store)
+        result = graph.invoke(
+            {
+                "messages": [
+                    SystemMessage(content=_SYSTEM_PROMPT),
+                    HumanMessage(content=query),
+                ]
+            }
+        )
+        messages = result.get("messages", [])
+        answer = self._extract_answer(messages)
+        sources = store.retrieved[:top_k]
+        confidence = max((r.score for r in sources), default=0.0)
+        logger.info(
+            "ReAct answer ready (confidence=%.4f, sources=%d, tool_calls=%d)",
+            confidence,
+            len(sources),
+            len(store.tool_calls),
+        )
+        return GenerationResponse(
+            answer=answer,
+            sources=sources,
+            intent=IntentType.RAG if sources else IntentType.FACTUAL,
+            confidence=confidence,
+            pipeline_details=PipelineDetails(
+                original_query=query,
+                retrieval_query=", ".join(q for _, q in store.tool_calls) or query,
+                reranked_results=sources,
+            ),
+        )
+    def route_stream(self, query: str, top_k: int) -> Generator[dict, None, None]:
+        """Stream ReAct agent events step by step.
+        Yields event dicts with the following step types (in addition to the
+        existing pipeline steps understood by the UI):
+        - ``tool_call``   — LLM decided to call a tool; carries ``tool`` and ``query``.
+        - ``tool_result`` — Tool returned; carries ``tool``, ``result_count``.
+        - ``generate``    — LLM is writing the final answer.
+        - ``done``        — Final event with the full result payload.
+        Args:
+            query: User query.
+            top_k: Number of results to retrieve per tool call.
+        Yields:
+            Step event dicts.
+        """
+        store = ToolResultStore()
+        graph = self._make_graph(store)
+        all_messages: list = []
+        for chunk in graph.stream(
+            {
+                "messages": [
+                    SystemMessage(content=_SYSTEM_PROMPT),
+                    HumanMessage(content=query),
+                ]
+            },
+            stream_mode="updates",
+        ):
+            for _node_name, update in chunk.items():
+                if update is None:
+                    continue
+                node_messages = update.get("messages", [])
+                all_messages.extend(node_messages)
+                for msg in node_messages:
+                    if isinstance(msg, AIMessage):
+                        for tc in getattr(msg, "tool_calls", []):
+                            yield {
+                                "step": "tool_call",
+                                "tool": tc.get("name", ""),
+                                "query": tc.get("args", {}).get("query", ""),
+                            }
+                        if msg.content and not getattr(msg, "tool_calls", None):
+                            yield {"step": "generate"}
+                    elif isinstance(msg, ToolMessage):
+                        yield {
+                            "step": "tool_result",
+                            "tool": getattr(msg, "name", ""),
+                            "result_count": len(store.retrieved),
+                        }
+        answer = self._extract_answer(all_messages)
+        sources = store.retrieved[:top_k]
+        confidence = max((r.score for r in sources), default=0.0)
+        yield {
+            "step": "done",
+            "result": {
+                "answer": answer,
+                "sources": _ser_sources(sources),
+                "intent": (IntentType.RAG if sources else IntentType.FACTUAL).value,
+                "confidence": confidence,
+                "pipeline_details": {
+                    "original_query": query,
+                    "retrieval_query": ", ".join(q for _, q in store.tool_calls) or query,
+                    "detected_language": "unknown",
+                    "translated": False,
+                    "dense_results": [],
+                    "sparse_results": [],
+                    "fused_results": [],
+                    "reranked_results": [
+                        {
+                            "document_id": r.chunk.document_id,
+                            "chunk_id": r.chunk.chunk_id,
+                            "score": r.score,
+                            "source": r.source,
+                        }
+                        for r in sources
+                    ],
+                },
+            },
+        }

src/agent/tools.py ADDED Viewed

	@@ -0,0 +1,153 @@

+"""LangChain tools for the ReAct agent."""
+import logging
+from dataclasses import dataclass, field
+from langchain_core.tools import tool
+from src.models import QueryResult
+from src.retrieval.hybrid import HybridRetriever
+from src.retrieval.reranker import Reranker
+from src.retrieval.vector_store import VectorStore
+logger = logging.getLogger(__name__)
+@dataclass
+class ToolResultStore:
+    """Captures structured retrieval results produced during tool invocations.
+    Attributes:
+        retrieved: Accumulated QueryResult list across all hybrid_search calls,
+            merged by chunk_id and sorted by descending score.
+        tool_calls: Log of (tool_name, query_or_arg) tuples in invocation order.
+    """
+    retrieved: list[QueryResult] = field(default_factory=list)
+    tool_calls: list[tuple[str, str]] = field(default_factory=list)
+def make_retrieval_tools(
+    hybrid_retriever: HybridRetriever,
+    reranker: Reranker,
+    vector_store: VectorStore,
+    store: ToolResultStore,
+    default_top_k: int = 5,
+) -> list:
+    """Create retrieval tools bound to the given components and result store.
+    The returned tools write structured QueryResult objects into *store* on each
+    invocation so the calling router can surface them as sources without having
+    to re-parse the tool's text output.
+    Args:
+        hybrid_retriever: HybridRetriever instance.
+        reranker: Reranker instance.
+        vector_store: VectorStore instance for document-level access.
+        store: Shared ToolResultStore that captures structured results.
+        default_top_k: Default number of results to return per call.
+    Returns:
+        List of LangChain tool callables ready for bind_tools / ToolNode.
+    """
+    @tool
+    def hybrid_search(query: str, top_k: int = default_top_k) -> str:
+        """Search the KU document knowledge base using hybrid retrieval.
+        Combines dense semantic search (Qdrant) and sparse keyword search (BM25),
+        then re-ranks results with a cross-encoder. Use this tool to find relevant
+        passages from ingested KU policy documents about rules, regulations, exam
+        procedures, employment conditions, and administrative guidelines.
+        Call this tool before answering any question that requires factual
+        information from KU documents. You may call it multiple times with
+        different queries if the first result is insufficient.
+        Args:
+            query: Search query. Danish gives the best recall against KU documents.
+            top_k: Number of top results to return (1–20). Default is 5.
+        Returns:
+            Formatted string of ranked document passages with source references
+            and relevance scores.
+        """
+        logger.info("Tool hybrid_search: query=%r top_k=%d", query, top_k)
+        store.tool_calls.append(("hybrid_search", query))
+        hybrid_result = hybrid_retriever.search_detailed(query, top_k=top_k)
+        results = reranker.rerank(query, hybrid_result.fused_results, top_k=top_k)
+        # Accumulate results across multiple calls (union by chunk_id, keep highest score)
+        existing = {r.chunk.chunk_id: r for r in store.retrieved}
+        for r in results:
+            cid = r.chunk.chunk_id
+            if cid not in existing or r.score > existing[cid].score:
+                existing[cid] = r
+        store.retrieved = sorted(existing.values(), key=lambda r: r.score, reverse=True)
+        if not results:
+            return "Ingen relevante dokumenter fundet. (No relevant documents found.)"
+        parts: list[str] = []
+        for i, r in enumerate(results, 1):
+            parts.append(
+                f"[{i}] {r.chunk.document_id}  (relevance: {r.score:.3f})\n{r.chunk.text}"
+            )
+        return "\n\n---\n\n".join(parts)
+    @tool
+    def list_documents() -> str:
+        """List all documents currently available in the KU knowledge base.
+        Use this tool when the user asks which documents are available, wants to
+        know what topics are covered, or before fetching a specific document by ID.
+        Returns:
+            Newline-separated list of document IDs, or a message if the
+            knowledge base is empty.
+        """
+        logger.info("Tool list_documents called")
+        store.tool_calls.append(("list_documents", ""))
+        ids = vector_store.list_document_ids()
+        if not ids:
+            return "Ingen dokumenter i videnbasen. (Knowledge base is empty.)"
+        lines = "\n".join(f"- {doc_id}" for doc_id in ids)
+        return f"Dokumenter i videnbasen ({len(ids)} i alt):\n{lines}"
+    @tool
+    def fetch_document(document_id: str) -> str:
+        """Fetch the full text of a specific document from the knowledge base.
+        Use this tool when the user asks for a summary or overview of a named
+        document, or when hybrid_search results reference a document that
+        warrants deeper reading. Prefer hybrid_search for targeted questions.
+        Args:
+            document_id: The exact document ID as returned by list_documents or
+                seen in hybrid_search results (e.g. 'ku_ai_policy.pdf').
+        Returns:
+            The concatenated text of all chunks belonging to the document, or
+            an error message if the document ID is not found.
+        """
+        logger.info("Tool fetch_document: document_id=%r", document_id)
+        store.tool_calls.append(("fetch_document", document_id))
+        chunks = vector_store.get_chunks_by_document_id(document_id)
+        if not chunks:
+            return (
+                f"Dokumentet '{document_id}' blev ikke fundet i videnbasen. "
+                f"(Document not found. Use list_documents to see available IDs.)"
+            )
+        # Sort chunks by chunk_id to preserve document order
+        chunks.sort(key=lambda c: c.chunk_id)
+        full_text = "\n\n".join(c.text for c in chunks)
+        return (
+            f"Dokument: {document_id}  ({len(chunks)} afsnit)\n\n"
+            f"{full_text}"
+        )
+    return [hybrid_search, list_documents, fetch_document]

src/api/main.py CHANGED Viewed

@@ -16,6 +16,7 @@ from src.retrieval.hybrid import HybridRetriever
 from src.retrieval.reranker import Reranker
 from src.agent.intent_classifier import IntentClassifier
 from src.agent.router import QueryRouter
 from src.ingestion.pipeline import IngestionPipeline
 from src.api.routes import router, set_dependencies
@@ -69,15 +70,27 @@ def create_app() -> FastAPI:
         bm25_weight=settings.bm25_weight,
     )
     reranker = Reranker(model=create_reranker(settings.reranker_model))
-    intent_classifier = IntentClassifier(llm=llm, model_name=settings.generation_model)
-    generator = llm | StrOutputParser()
-    query_router = QueryRouter(
-        intent_classifier=intent_classifier,
-        hybrid_retriever=hybrid_retriever,
-        reranker=reranker,
-        generator=generator,
-        translate_query=settings.translate_query,
-    )
     set_dependencies(
         query_router=query_router,

 from src.retrieval.reranker import Reranker
 from src.agent.intent_classifier import IntentClassifier
 from src.agent.router import QueryRouter
+from src.agent.react_router import ReActRouter
 from src.ingestion.pipeline import IngestionPipeline
 from src.api.routes import router, set_dependencies
         bm25_weight=settings.bm25_weight,
     )
     reranker = Reranker(model=create_reranker(settings.reranker_model))
+    if settings.agent_mode == "react":
+        logger.info("Agent mode: ReAct (tool-calling loop)")
+        query_router: QueryRouter | ReActRouter = ReActRouter(
+            llm=llm,
+            hybrid_retriever=hybrid_retriever,
+            reranker=reranker,
+            vector_store=vector_store,
+            default_top_k=settings.top_k,
+        )
+    else:
+        logger.info("Agent mode: pipeline (fixed DAG)")
+        intent_classifier = IntentClassifier(llm=llm, model_name=settings.generation_model)
+        generator = llm | StrOutputParser()
+        query_router = QueryRouter(
+            intent_classifier=intent_classifier,
+            hybrid_retriever=hybrid_retriever,
+            reranker=reranker,
+            generator=generator,
+            translate_query=settings.translate_query,
+        )
     set_dependencies(
         query_router=query_router,

src/api/routes.py CHANGED Viewed

@@ -14,6 +14,7 @@ from pydantic import BaseModel
 if TYPE_CHECKING:
     from src.agent.router import QueryRouter
     from src.config import Settings
     from src.ingestion.pipeline import IngestionPipeline
     from src.retrieval.bm25_search import BM25Search
@@ -24,7 +25,7 @@ logger = logging.getLogger(__name__)
 router = APIRouter()
-_query_router: "QueryRouter | None" = None
 _ingestion_pipeline: "IngestionPipeline | None" = None
 _embedder: "Embedder | None" = None
 _vector_store: "VectorStore | None" = None
@@ -33,7 +34,7 @@ _settings: "Settings | None" = None
 def set_dependencies(
-    query_router: "QueryRouter",
     ingestion_pipeline: "IngestionPipeline",
     embedder: "Embedder",
     vector_store: "VectorStore",

 if TYPE_CHECKING:
     from src.agent.router import QueryRouter
+    from src.agent.react_router import ReActRouter
     from src.config import Settings
     from src.ingestion.pipeline import IngestionPipeline
     from src.retrieval.bm25_search import BM25Search
 router = APIRouter()
+_query_router: "QueryRouter | ReActRouter | None" = None
 _ingestion_pipeline: "IngestionPipeline | None" = None
 _embedder: "Embedder | None" = None
 _vector_store: "VectorStore | None" = None
 def set_dependencies(
+    query_router: "QueryRouter | ReActRouter",
     ingestion_pipeline: "IngestionPipeline",
     embedder: "Embedder",
     vector_store: "VectorStore",

src/config.py CHANGED Viewed

@@ -64,6 +64,9 @@ class Settings:
     # Query translation
     translate_query: bool
 def _parse_bool(value: str, *, default: bool) -> bool:
     """Parse a boolean environment variable string.
@@ -141,4 +144,8 @@ def load_settings() -> Settings:
             os.environ.get("TRANSLATE_QUERY", ""),
             default=os.environ.get("LLM_PROVIDER", "ollama") == "ollama",
         ),
     )

     # Query translation
     translate_query: bool
+    # Agent mode: "pipeline" (fixed DAG) or "react" (tool-calling ReAct loop)
+    agent_mode: str
 def _parse_bool(value: str, *, default: bool) -> bool:
     """Parse a boolean environment variable string.
             os.environ.get("TRANSLATE_QUERY", ""),
             default=os.environ.get("LLM_PROVIDER", "ollama") == "ollama",
         ),
+        # Agent mode: "pipeline" keeps the existing fixed DAG; "react" enables
+        # the multi-step ReAct loop (requires an LLM with tool-calling support).
+        agent_mode=os.environ.get("AGENT_MODE", "pipeline"),
     )

src/retrieval/vector_store.py CHANGED Viewed

@@ -9,7 +9,7 @@ from langchain_core.documents import Document
 from langchain_core.retrievers import BaseRetriever
 from pydantic import ConfigDict
 from qdrant_client import QdrantClient
-from qdrant_client.models import Distance, PointStruct, VectorParams
 from src.models import ChunkStrategy, DocumentChunk, QueryResult
@@ -144,6 +144,55 @@ class VectorStore:
         logger.info("Loaded %d chunks from collection '%s'", len(chunks), self._collection_name)
         return chunks
     def as_retriever(self, embedder: Any, top_k: int) -> BaseRetriever:
         """Return a LangChain BaseRetriever wrapping this vector store.

 from langchain_core.retrievers import BaseRetriever
 from pydantic import ConfigDict
 from qdrant_client import QdrantClient
+from qdrant_client.models import Distance, FieldCondition, Filter, MatchValue, PointStruct, VectorParams
 from src.models import ChunkStrategy, DocumentChunk, QueryResult
         logger.info("Loaded %d chunks from collection '%s'", len(chunks), self._collection_name)
         return chunks
+    def list_document_ids(self) -> list[str]:
+        """Return a sorted list of unique document IDs in the collection.
+        Returns:
+            Sorted list of document ID strings.
+        """
+        all_chunks = self.get_all_chunks()
+        ids = sorted({chunk.document_id for chunk in all_chunks})
+        logger.debug("Found %d unique document IDs", len(ids))
+        return ids
+    def get_chunks_by_document_id(self, document_id: str) -> list[DocumentChunk]:
+        """Retrieve all chunks belonging to a specific document.
+        Uses a Qdrant payload filter to avoid loading the full collection.
+        Args:
+            document_id: The document identifier to filter by.
+        Returns:
+            List of DocumentChunk objects for that document, in storage order.
+        """
+        records, _offset = self._client.scroll(
+            collection_name=self._collection_name,
+            scroll_filter=Filter(
+                must=[FieldCondition(key="document_id", match=MatchValue(value=document_id))]
+            ),
+            limit=10_000,
+            with_payload=True,
+            with_vectors=False,
+        )
+        chunks: list[DocumentChunk] = []
+        for record in records:
+            payload = record.payload
+            chunks.append(
+                DocumentChunk(
+                    chunk_id=payload["chunk_id"],
+                    document_id=payload["document_id"],
+                    text=payload["text"],
+                    metadata=json.loads(payload["metadata"]),
+                    strategy=ChunkStrategy(payload["strategy"]),
+                )
+            )
+        logger.debug(
+            "Fetched %d chunks for document '%s'", len(chunks), document_id
+        )
+        return chunks
     def as_retriever(self, embedder: Any, top_k: int) -> BaseRetriever:
         """Return a LangChain BaseRetriever wrapping this vector store.

src/ui/app.py CHANGED Viewed

@@ -54,8 +54,10 @@ TEXTS: dict[str, dict[str, str]] = {
             "- **LLM-integration** — provider-agnostisk, prompt-styret "
             "svargenerering\n"
             "- **Evaluering** — RAGAS-baseret kvalitetsmaaling\n"
-            "- **Agent-routing** — intent-klassifikation og "
-            "forespørgselsdirigering"
         ),
         "chunking_label": "Chunking-strategi",
         "chunking_help": "Vaelg hvordan dokumenterne opdeles i tekststykker.",
@@ -99,7 +101,7 @@ TEXTS: dict[str, dict[str, str]] = {
         "pipeline_original": "Original foresporgsel",
         "pipeline_translated": "Oversat til dansk",
         "pipeline_lang": "Sprog registreret",
-        "pipeline_no_translation": "Ingen oversaettelse (foresporgsel allerede paa dansk)",
         "pipeline_bm25": "BM25-resultater (leksikalsk soegning)",
         "pipeline_dense": "Vektorsoegning (semantisk)",
         "pipeline_fused": "RRF-fusioneret raekkefoelge",
@@ -129,8 +131,10 @@ TEXTS: dict[str, dict[str, str]] = {
             "- **LLM integration** — provider-agnostic, prompt-driven "
             "answer generation\n"
             "- **Evaluation** — RAGAS-based quality measurement\n"
-            "- **Agent routing** — intent classification and query "
-            "dispatch"
         ),
         "chunking_label": "Chunking strategy",
         "chunking_help": "Choose how documents are split into text chunks.",
@@ -174,7 +178,7 @@ TEXTS: dict[str, dict[str, str]] = {
         "pipeline_original": "Original query",
         "pipeline_translated": "Translated to Danish",
         "pipeline_lang": "Detected language",
-        "pipeline_no_translation": "No translation (query already in Danish)",
         "pipeline_bm25": "BM25 Results (lexical search)",
         "pipeline_dense": "Vector Search (semantic)",
         "pipeline_fused": "RRF Fused Ranking",
@@ -487,9 +491,9 @@ if search_clicked and question.strip():
                             )
                         else:
                             st.write(
-                                "Forespørgsel allerede på dansk"
                                 if lang == "da"
-                                else "Query already in Danish"
                             )
                     elif _step == "retrieve":
@@ -510,6 +514,23 @@ if search_clicked and question.strip():
                             else (f"Reranked to **{_rc}** results · confidence **{_cf:.0%}**")
                         )
                     elif _step == "generate":
                         st.write(
                             "Svar genereret"

             "- **LLM-integration** — provider-agnostisk, prompt-styret "
             "svargenerering\n"
             "- **Evaluering** — RAGAS-baseret kvalitetsmaaling\n"
+            "- **Agent Flows** — valgfri ReAct-loop med vaerktoejskald: "
+            "LLM bestemmer selv hvor mange soegninger der behoeves og "
+            "stoetter flertrinssraesonnering paa tvaers af dokumenter "
+            "(`AGENT_MODE=react`)"
         ),
         "chunking_label": "Chunking-strategi",
         "chunking_help": "Vaelg hvordan dokumenterne opdeles i tekststykker.",
         "pipeline_original": "Original foresporgsel",
         "pipeline_translated": "Oversat til dansk",
         "pipeline_lang": "Sprog registreret",
+        "pipeline_no_translation": "Ingen oversaettelse nødvendig",
         "pipeline_bm25": "BM25-resultater (leksikalsk soegning)",
         "pipeline_dense": "Vektorsoegning (semantisk)",
         "pipeline_fused": "RRF-fusioneret raekkefoelge",
             "- **LLM integration** — provider-agnostic, prompt-driven "
             "answer generation\n"
             "- **Evaluation** — RAGAS-based quality measurement\n"
+            "- **Agent Flows** — optional ReAct loop with tool calling: "
+            "the LLM decides how many searches are needed and supports "
+            "multi-step reasoning across documents "
+            "(`AGENT_MODE=react`)"
         ),
         "chunking_label": "Chunking strategy",
         "chunking_help": "Choose how documents are split into text chunks.",
         "pipeline_original": "Original query",
         "pipeline_translated": "Translated to Danish",
         "pipeline_lang": "Detected language",
+        "pipeline_no_translation": "No need for translation",
         "pipeline_bm25": "BM25 Results (lexical search)",
         "pipeline_dense": "Vector Search (semantic)",
         "pipeline_fused": "RRF Fused Ranking",
                             )
                         else:
                             st.write(
+                                "Ingen oversættelse nødvendig for forespørgslen"
                                 if lang == "da"
+                                else "No translation needed for the query"
                             )
                     elif _step == "retrieve":
                             else (f"Reranked to **{_rc}** results · confidence **{_cf:.0%}**")
                         )
+                    elif _step == "tool_call":
+                        _tool_name = _event.get("tool", "")
+                        _tool_query = _event.get("query", "")
+                        st.write(
+                            (f"Vaerktoej **{_tool_name}** kaldt: _{_tool_query}_")
+                            if lang == "da"
+                            else (f"Tool **{_tool_name}** called: _{_tool_query}_")
+                        )
+                    elif _step == "tool_result":
+                        _rc = _event.get("result_count", 0)
+                        st.write(
+                            (f"Hentet **{_rc}** dokumenter")
+                            if lang == "da"
+                            else (f"Retrieved **{_rc}** documents")
+                        )
                     elif _step == "generate":
                         st.write(
                             "Svar genereret"