Spaces:

XQ
/

Dokumentassistent

Sleeping

App Files Files

XQ commited on Apr 6

Commit

d7b3297

1 Parent(s): 4ba88df

Change default mode to ReAct agent

Browse files

Files changed (4) hide show

.github/README.md +22 -14
README.md +1 -1
src/agent/router.py +11 -60
src/ui/app.py +4 -4

.github/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 **Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
-A document intelligence system covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a fixed pipeline for lightweight models, a LangGraph ReAct agent for queries that need multiple retrieval steps. Retrieval quality is evaluated with RAGAS.
 ## How it works
@@ -12,9 +12,7 @@ At query time both indexes are searched and their results merged with reciprocal
 **Two routing modes, switchable via `AGENT_MODE`:**
-- **Pipeline** (default): a fixed LangGraph DAG — language detection → optional translation → hybrid retrieval → reranking → generation. Works with lightweight local models like `gemma4`.
-- **ReAct Agent** (`AGENT_MODE=react`): replaces the DAG with a reasoning loop where the LLM calls tools as many times as it needs before answering. Useful for multi-hop questions or comparisons across documents. Requires a model with tool-calling support.
   | Tool | Purpose |
   |------|---------|
@@ -22,6 +20,8 @@ At query time both indexes are searched and their results merged with reciprocal
   | `list_documents()` | See what's in the knowledge base |
   | `fetch_document(document_id)` | Read a full document |
 ## Tech Stack
 | Category | Technology |
@@ -47,26 +47,34 @@ See `.env.example` for per-provider configuration.
 | Mode | `AGENT_MODE` | Notes |
 |------|-------------|-------|
-| Pipeline | `pipeline` (default) | Fixed DAG, works with `gemma4` |
-| ReAct | `react` | Tool-calling loop, needs a model that supports tool use |
-Tool-calling is supported by OpenAI, Anthropic, Google GenAI, Azure OpenAI, Groq, and some Ollama models (`llama3.1`, `qwen2.5`, `mistral-nemo`). The default `gemma4` does not support it — use `pipeline` mode with Ollama.
-ReAct with OpenAI:
 ```dotenv
 AGENT_MODE=react
-LLM_PROVIDER=openai
-OPENAI_API_KEY=sk-...
-OPENAI_MODEL=gpt-4o-mini
 ```
-Pipeline with local Ollama:
 ```dotenv
 AGENT_MODE=pipeline
 LLM_PROVIDER=ollama
-OLLAMA_MODEL=gemma4
 ```
 ## Quick Start
@@ -135,7 +143,7 @@ src/
     reranker.py            # cross-encoder
   api/
     main.py
-    routes.py              # /query, /ingest, /health
   agent/
     intent_classifier.py
     router.py              # pipeline mode (AGENT_MODE=pipeline)

 **Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
+A document intelligence system covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a LangGraph ReAct agent (default) for queries that need multiple retrieval steps, and a pipeline for lightweight models without tool-calling support. Retrieval quality is evaluated with RAGAS.
 ## How it works
 **Two routing modes, switchable via `AGENT_MODE`:**
+- **ReAct Agent** (default): a reasoning loop where the LLM calls tools as many times as it needs before answering. Useful for multi-hop questions or comparisons across documents. Requires a model with tool-calling support.
   | Tool | Purpose |
   |------|---------|
   | `list_documents()` | See what's in the knowledge base |
   | `fetch_document(document_id)` | Read a full document |
+- **Pipeline** (`AGENT_MODE=pipeline`): a fixed LangGraph graph — language detection → optional translation → hybrid retrieval → reranking → generation. Works with lightweight local models that lack tool-calling support.
 ## Tech Stack
 | Category | Technology |
 | Mode | `AGENT_MODE` | Notes |
 |------|-------------|-------|
+| ReAct | `react` (default) | Tool-calling loop, needs a model that supports tool use |
+| Pipeline | `pipeline` | Fixed graph, works with lightweight models that lack tool calling |
+Tool-calling is supported by OpenAI, Anthropic, Google GenAI, Azure OpenAI, Groq, and some Ollama models (`gemma4`, `llama3.1`, `qwen2.5`, `mistral-nemo`).
+ReAct with local Ollama (default):
 ```dotenv
 AGENT_MODE=react
+LLM_PROVIDER=ollama
+OLLAMA_MODEL=gemma4:e4b
 ```
+Pipeline with a lightweight model:
 ```dotenv
 AGENT_MODE=pipeline
 LLM_PROVIDER=ollama
+OLLAMA_MODEL=gemma3
+```
+ReAct with OpenAI:
+```dotenv
+AGENT_MODE=react
+LLM_PROVIDER=openai
+OPENAI_API_KEY=sk-...
+OPENAI_MODEL=gpt-4o-mini
 ```
 ## Quick Start
     reranker.py            # cross-encoder
   api/
     main.py
+    routes.py              # /query, /query/stream, /ingest, /health
   agent/
     intent_classifier.py
     router.py              # pipeline mode (AGENT_MODE=pipeline)

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ noindex: true
 **Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
-A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a fixed pipeline for lightweight models, a LangGraph ReAct agent for queries that need multiple retrieval steps. Retrieval quality is evaluated with RAGAS.
 ## How it works

 **Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
+A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a pipeline for lightweight models, a LangGraph ReAct agent for queries that need multiple retrieval steps. Retrieval quality is evaluated with RAGAS.
 ## How it works

src/agent/router.py CHANGED Viewed

@@ -1,4 +1,14 @@
-"""Query router that selects retrieval strategy based on intent."""
 import logging
 import unicodedata
@@ -443,65 +453,6 @@ class QueryRouter:
             pipeline_details=pipeline,
         )
-        # --- Old if/else routing (replaced by LangGraph above) ---
-        #
-        # user_language, intent = self._detect_language_and_intent(query)
-        # retrieval_query = self._translate_query(query, user_language)
-        # translated = retrieval_query != query
-        #
-        # logger.info("Classified intent: %s", intent.value)
-        # logger.debug("Intent classification result: %s for query='%s'", intent.value, query)
-        #
-        # should_retrieve = intent != IntentType.UNKNOWN
-        # logger.debug("Retrieval executed: %s (intent=%s)", should_retrieve, intent.value)
-        #
-        # pipeline = PipelineDetails(
-        #     original_query=query,
-        #     retrieval_query=retrieval_query,
-        #     detected_language=user_language,
-        #     translated=translated,
-        # )
-        #
-        # if should_retrieve:
-        #     hybrid_result = self._hybrid_retriever.search_detailed(retrieval_query, top_k=top_k)
-        #     pipeline.dense_results = hybrid_result.dense_results
-        #     pipeline.sparse_results = hybrid_result.sparse_results
-        #     pipeline.fused_results = hybrid_result.fused_results
-        #     results = hybrid_result.fused_results
-        # else:
-        #     results = []
-        #
-        # logger.info("Retrieved %d results from hybrid search", len(results))
-        # logger.debug("Retrieval returned %d results", len(results))
-        #
-        # reranked = self._reranker.rerank(retrieval_query, results, top_k=top_k) if results else []
-        # pipeline.reranked_results = reranked
-        # logger.info("Reranked to %d results", len(reranked))
-        #
-        # if reranked and intent == IntentType.FACTUAL:
-        #     intent = IntentType.RAG
-        #     logger.info("Overriding intent to RAG (sources retrieved)")
-        #
-        # context = "\n\n".join(r.chunk.text for r in reranked)
-        # prompt = self._build_prompt(query, intent, context, user_language)
-        #
-        # answer = self._llm_chain.invoke(prompt)
-        # logger.info("Generated answer for intent=%s", intent.value)
-        #
-        # if reranked:
-        #     confidence = max(r.score for r in reranked)
-        #     logger.info("Confidence: %.4f (sigmoid-normalized by reranker)", confidence)
-        # else:
-        #     confidence = 0.0
-        #
-        # return GenerationResponse(
-        #     answer=str(answer),
-        #     sources=reranked,
-        #     intent=intent,
-        #     confidence=confidence,
-        #     pipeline_details=pipeline,
-        # )
     def route_stream(self, query: str, top_k: int) -> Generator[dict, None, None]:
         """Stream pipeline events as each LangGraph node completes.

+"""Query router that selects retrieval strategy based on intent.
+--------------------------------------------------------------------
+This is to support lightweight local models (e.g. gemma3) that lack
+tool/function-calling capability. LangGraph moves all routing decisions
+(intent branching, confidence-based retry) into graph edges so the
+pipeline works identically regardless of the underlying model.
+This pipeline has a conditional retry loop (low confidence → broaden query → re-retrieve).
+LangGraph makes that cycle, the conditional skip, and per-node streaming
+explicit and testable without hand-rolled flags or callback plumbing.
+"""
 import logging
 import unicodedata
             pipeline_details=pipeline,
         )
     def route_stream(self, query: str, top_k: int) -> Generator[dict, None, None]:
         """Stream pipeline events as each LangGraph node completes.

src/ui/app.py CHANGED Viewed

@@ -66,8 +66,8 @@ TEXTS: dict[str, dict[str, str]] = {
             "Et dokumentintelligens-system bygget på en RAG-arkitektur, dækkende PDF-indlæsning, semantisk chunking, "
             "hybrid søgning med reranking "
             "og LLM-genererede svar med kildehenvisninger. LLM-laget er provider-agnostisk. "
-            "To tilstande: en fast pipeline til lette modeller og en LangGraph ReAct-agent "
-            "til forespørgsler der kræver flere søgetrin. Søgekvaliteten evalueres med RAGAS."
         ),
         "search_label": "Stil et spørgsmål om ... ",
         "search_placeholder": "F.eks.: Hvad er reglerne for behandling af personoplysninger?",
@@ -144,8 +144,8 @@ TEXTS: dict[str, dict[str, str]] = {
             "A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, "
             "hybrid retrieval with reranking, "
             "and LLM-generated answers with source citations. The LLM layer is provider-agnostic. "
-            "Two modes: a fixed pipeline for lightweight models, a LangGraph ReAct agent "
-            "for queries that need multiple retrieval steps. "
             "Retrieval quality is evaluated with RAGAS."
         ),
         "search_label": "Ask a question ...",

             "Et dokumentintelligens-system bygget på en RAG-arkitektur, dækkende PDF-indlæsning, semantisk chunking, "
             "hybrid søgning med reranking "
             "og LLM-genererede svar med kildehenvisninger. LLM-laget er provider-agnostisk. "
+            "To tilstande: en LangGraph ReAct-agent (standard) til forespørgsler der kræver flere søgetrin, "
+            "og en pipeline til lette modeller uden værktøjskald. Søgekvaliteten evalueres med RAGAS."
         ),
         "search_label": "Stil et spørgsmål om ... ",
         "search_placeholder": "F.eks.: Hvad er reglerne for behandling af personoplysninger?",
             "A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, "
             "hybrid retrieval with reranking, "
             "and LLM-generated answers with source citations. The LLM layer is provider-agnostic. "
+            "Two modes: a LangGraph ReAct agent (default) for queries that need multiple retrieval steps, "
+            "and a pipeline for lightweight models without tool-calling support. "
             "Retrieval quality is evaluated with RAGAS."
         ),
         "search_label": "Ask a question ...",