XQ commited on
Commit
d7b3297
·
1 Parent(s): 4ba88df

Change default mode to ReAct agent

Browse files
Files changed (4) hide show
  1. .github/README.md +22 -14
  2. README.md +1 -1
  3. src/agent/router.py +11 -60
  4. src/ui/app.py +4 -4
.github/README.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  **Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
4
 
5
- A document intelligence system covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a fixed pipeline for lightweight models, a LangGraph ReAct agent for queries that need multiple retrieval steps. Retrieval quality is evaluated with RAGAS.
6
 
7
  ## How it works
8
 
@@ -12,9 +12,7 @@ At query time both indexes are searched and their results merged with reciprocal
12
 
13
  **Two routing modes, switchable via `AGENT_MODE`:**
14
 
15
- - **Pipeline** (default): a fixed LangGraph DAG language detection optional translation hybrid retrieval reranking → generation. Works with lightweight local models like `gemma4`.
16
-
17
- - **ReAct Agent** (`AGENT_MODE=react`): replaces the DAG with a reasoning loop where the LLM calls tools as many times as it needs before answering. Useful for multi-hop questions or comparisons across documents. Requires a model with tool-calling support.
18
 
19
  | Tool | Purpose |
20
  |------|---------|
@@ -22,6 +20,8 @@ At query time both indexes are searched and their results merged with reciprocal
22
  | `list_documents()` | See what's in the knowledge base |
23
  | `fetch_document(document_id)` | Read a full document |
24
 
 
 
25
  ## Tech Stack
26
 
27
  | Category | Technology |
@@ -47,26 +47,34 @@ See `.env.example` for per-provider configuration.
47
 
48
  | Mode | `AGENT_MODE` | Notes |
49
  |------|-------------|-------|
50
- | Pipeline | `pipeline` (default) | Fixed DAG, works with `gemma4` |
51
- | ReAct | `react` | Tool-calling loop, needs a model that supports tool use |
52
 
53
- Tool-calling is supported by OpenAI, Anthropic, Google GenAI, Azure OpenAI, Groq, and some Ollama models (`llama3.1`, `qwen2.5`, `mistral-nemo`). The default `gemma4` does not support it — use `pipeline` mode with Ollama.
54
 
55
- ReAct with OpenAI:
56
 
57
  ```dotenv
58
  AGENT_MODE=react
59
- LLM_PROVIDER=openai
60
- OPENAI_API_KEY=sk-...
61
- OPENAI_MODEL=gpt-4o-mini
62
  ```
63
 
64
- Pipeline with local Ollama:
65
 
66
  ```dotenv
67
  AGENT_MODE=pipeline
68
  LLM_PROVIDER=ollama
69
- OLLAMA_MODEL=gemma4
 
 
 
 
 
 
 
 
 
70
  ```
71
 
72
  ## Quick Start
@@ -135,7 +143,7 @@ src/
135
  reranker.py # cross-encoder
136
  api/
137
  main.py
138
- routes.py # /query, /ingest, /health
139
  agent/
140
  intent_classifier.py
141
  router.py # pipeline mode (AGENT_MODE=pipeline)
 
2
 
3
  **Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
4
 
5
+ A document intelligence system covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a LangGraph ReAct agent (default) for queries that need multiple retrieval steps, and a pipeline for lightweight models without tool-calling support. Retrieval quality is evaluated with RAGAS.
6
 
7
  ## How it works
8
 
 
12
 
13
  **Two routing modes, switchable via `AGENT_MODE`:**
14
 
15
+ - **ReAct Agent** (default): a reasoning loop where the LLM calls tools as many times as it needs before answering. Useful for multi-hop questions or comparisons across documents. Requires a model with tool-calling support.
 
 
16
 
17
  | Tool | Purpose |
18
  |------|---------|
 
20
  | `list_documents()` | See what's in the knowledge base |
21
  | `fetch_document(document_id)` | Read a full document |
22
 
23
+ - **Pipeline** (`AGENT_MODE=pipeline`): a fixed LangGraph graph — language detection → optional translation → hybrid retrieval → reranking → generation. Works with lightweight local models that lack tool-calling support.
24
+
25
  ## Tech Stack
26
 
27
  | Category | Technology |
 
47
 
48
  | Mode | `AGENT_MODE` | Notes |
49
  |------|-------------|-------|
50
+ | ReAct | `react` (default) | Tool-calling loop, needs a model that supports tool use |
51
+ | Pipeline | `pipeline` | Fixed graph, works with lightweight models that lack tool calling |
52
 
53
+ Tool-calling is supported by OpenAI, Anthropic, Google GenAI, Azure OpenAI, Groq, and some Ollama models (`gemma4`, `llama3.1`, `qwen2.5`, `mistral-nemo`).
54
 
55
+ ReAct with local Ollama (default):
56
 
57
  ```dotenv
58
  AGENT_MODE=react
59
+ LLM_PROVIDER=ollama
60
+ OLLAMA_MODEL=gemma4:e4b
 
61
  ```
62
 
63
+ Pipeline with a lightweight model:
64
 
65
  ```dotenv
66
  AGENT_MODE=pipeline
67
  LLM_PROVIDER=ollama
68
+ OLLAMA_MODEL=gemma3
69
+ ```
70
+
71
+ ReAct with OpenAI:
72
+
73
+ ```dotenv
74
+ AGENT_MODE=react
75
+ LLM_PROVIDER=openai
76
+ OPENAI_API_KEY=sk-...
77
+ OPENAI_MODEL=gpt-4o-mini
78
  ```
79
 
80
  ## Quick Start
 
143
  reranker.py # cross-encoder
144
  api/
145
  main.py
146
+ routes.py # /query, /query/stream, /ingest, /health
147
  agent/
148
  intent_classifier.py
149
  router.py # pipeline mode (AGENT_MODE=pipeline)
README.md CHANGED
@@ -12,7 +12,7 @@ noindex: true
12
 
13
  **Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
14
 
15
- A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a fixed pipeline for lightweight models, a LangGraph ReAct agent for queries that need multiple retrieval steps. Retrieval quality is evaluated with RAGAS.
16
 
17
  ## How it works
18
 
 
12
 
13
  **Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
14
 
15
+ A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a pipeline for lightweight models, a LangGraph ReAct agent for queries that need multiple retrieval steps. Retrieval quality is evaluated with RAGAS.
16
 
17
  ## How it works
18
 
src/agent/router.py CHANGED
@@ -1,4 +1,14 @@
1
- """Query router that selects retrieval strategy based on intent."""
 
 
 
 
 
 
 
 
 
 
2
 
3
  import logging
4
  import unicodedata
@@ -443,65 +453,6 @@ class QueryRouter:
443
  pipeline_details=pipeline,
444
  )
445
 
446
- # --- Old if/else routing (replaced by LangGraph above) ---
447
- #
448
- # user_language, intent = self._detect_language_and_intent(query)
449
- # retrieval_query = self._translate_query(query, user_language)
450
- # translated = retrieval_query != query
451
- #
452
- # logger.info("Classified intent: %s", intent.value)
453
- # logger.debug("Intent classification result: %s for query='%s'", intent.value, query)
454
- #
455
- # should_retrieve = intent != IntentType.UNKNOWN
456
- # logger.debug("Retrieval executed: %s (intent=%s)", should_retrieve, intent.value)
457
- #
458
- # pipeline = PipelineDetails(
459
- # original_query=query,
460
- # retrieval_query=retrieval_query,
461
- # detected_language=user_language,
462
- # translated=translated,
463
- # )
464
- #
465
- # if should_retrieve:
466
- # hybrid_result = self._hybrid_retriever.search_detailed(retrieval_query, top_k=top_k)
467
- # pipeline.dense_results = hybrid_result.dense_results
468
- # pipeline.sparse_results = hybrid_result.sparse_results
469
- # pipeline.fused_results = hybrid_result.fused_results
470
- # results = hybrid_result.fused_results
471
- # else:
472
- # results = []
473
- #
474
- # logger.info("Retrieved %d results from hybrid search", len(results))
475
- # logger.debug("Retrieval returned %d results", len(results))
476
- #
477
- # reranked = self._reranker.rerank(retrieval_query, results, top_k=top_k) if results else []
478
- # pipeline.reranked_results = reranked
479
- # logger.info("Reranked to %d results", len(reranked))
480
- #
481
- # if reranked and intent == IntentType.FACTUAL:
482
- # intent = IntentType.RAG
483
- # logger.info("Overriding intent to RAG (sources retrieved)")
484
- #
485
- # context = "\n\n".join(r.chunk.text for r in reranked)
486
- # prompt = self._build_prompt(query, intent, context, user_language)
487
- #
488
- # answer = self._llm_chain.invoke(prompt)
489
- # logger.info("Generated answer for intent=%s", intent.value)
490
- #
491
- # if reranked:
492
- # confidence = max(r.score for r in reranked)
493
- # logger.info("Confidence: %.4f (sigmoid-normalized by reranker)", confidence)
494
- # else:
495
- # confidence = 0.0
496
- #
497
- # return GenerationResponse(
498
- # answer=str(answer),
499
- # sources=reranked,
500
- # intent=intent,
501
- # confidence=confidence,
502
- # pipeline_details=pipeline,
503
- # )
504
-
505
  def route_stream(self, query: str, top_k: int) -> Generator[dict, None, None]:
506
  """Stream pipeline events as each LangGraph node completes.
507
 
 
1
+ """Query router that selects retrieval strategy based on intent.
2
+ --------------------------------------------------------------------
3
+ This is to support lightweight local models (e.g. gemma3) that lack
4
+ tool/function-calling capability. LangGraph moves all routing decisions
5
+ (intent branching, confidence-based retry) into graph edges so the
6
+ pipeline works identically regardless of the underlying model.
7
+
8
+ This pipeline has a conditional retry loop (low confidence → broaden query → re-retrieve).
9
+ LangGraph makes that cycle, the conditional skip, and per-node streaming
10
+ explicit and testable without hand-rolled flags or callback plumbing.
11
+ """
12
 
13
  import logging
14
  import unicodedata
 
453
  pipeline_details=pipeline,
454
  )
455
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
456
  def route_stream(self, query: str, top_k: int) -> Generator[dict, None, None]:
457
  """Stream pipeline events as each LangGraph node completes.
458
 
src/ui/app.py CHANGED
@@ -66,8 +66,8 @@ TEXTS: dict[str, dict[str, str]] = {
66
  "Et dokumentintelligens-system bygget på en RAG-arkitektur, dækkende PDF-indlæsning, semantisk chunking, "
67
  "hybrid søgning med reranking "
68
  "og LLM-genererede svar med kildehenvisninger. LLM-laget er provider-agnostisk. "
69
- "To tilstande: en fast pipeline til lette modeller og en LangGraph ReAct-agent "
70
- "til forespørgsler der kræver flere søgetrin. Søgekvaliteten evalueres med RAGAS."
71
  ),
72
  "search_label": "Stil et spørgsmål om ... ",
73
  "search_placeholder": "F.eks.: Hvad er reglerne for behandling af personoplysninger?",
@@ -144,8 +144,8 @@ TEXTS: dict[str, dict[str, str]] = {
144
  "A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, "
145
  "hybrid retrieval with reranking, "
146
  "and LLM-generated answers with source citations. The LLM layer is provider-agnostic. "
147
- "Two modes: a fixed pipeline for lightweight models, a LangGraph ReAct agent "
148
- "for queries that need multiple retrieval steps. "
149
  "Retrieval quality is evaluated with RAGAS."
150
  ),
151
  "search_label": "Ask a question ...",
 
66
  "Et dokumentintelligens-system bygget på en RAG-arkitektur, dækkende PDF-indlæsning, semantisk chunking, "
67
  "hybrid søgning med reranking "
68
  "og LLM-genererede svar med kildehenvisninger. LLM-laget er provider-agnostisk. "
69
+ "To tilstande: en LangGraph ReAct-agent (standard) til forespørgsler der kræver flere søgetrin, "
70
+ "og en pipeline til lette modeller uden værktøjskald. Søgekvaliteten evalueres med RAGAS."
71
  ),
72
  "search_label": "Stil et spørgsmål om ... ",
73
  "search_placeholder": "F.eks.: Hvad er reglerne for behandling af personoplysninger?",
 
144
  "A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, "
145
  "hybrid retrieval with reranking, "
146
  "and LLM-generated answers with source citations. The LLM layer is provider-agnostic. "
147
+ "Two modes: a LangGraph ReAct agent (default) for queries that need multiple retrieval steps, "
148
+ "and a pipeline for lightweight models without tool-calling support. "
149
  "Retrieval quality is evaluated with RAGAS."
150
  ),
151
  "search_label": "Ask a question ...",