Spaces:
Sleeping
Sleeping
XQ commited on
Commit ·
d7b3297
1
Parent(s): 4ba88df
Change default mode to ReAct agent
Browse files- .github/README.md +22 -14
- README.md +1 -1
- src/agent/router.py +11 -60
- src/ui/app.py +4 -4
.github/README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
|
| 3 |
**Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
|
| 4 |
|
| 5 |
-
A document intelligence system covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a
|
| 6 |
|
| 7 |
## How it works
|
| 8 |
|
|
@@ -12,9 +12,7 @@ At query time both indexes are searched and their results merged with reciprocal
|
|
| 12 |
|
| 13 |
**Two routing modes, switchable via `AGENT_MODE`:**
|
| 14 |
|
| 15 |
-
- **
|
| 16 |
-
|
| 17 |
-
- **ReAct Agent** (`AGENT_MODE=react`): replaces the DAG with a reasoning loop where the LLM calls tools as many times as it needs before answering. Useful for multi-hop questions or comparisons across documents. Requires a model with tool-calling support.
|
| 18 |
|
| 19 |
| Tool | Purpose |
|
| 20 |
|------|---------|
|
|
@@ -22,6 +20,8 @@ At query time both indexes are searched and their results merged with reciprocal
|
|
| 22 |
| `list_documents()` | See what's in the knowledge base |
|
| 23 |
| `fetch_document(document_id)` | Read a full document |
|
| 24 |
|
|
|
|
|
|
|
| 25 |
## Tech Stack
|
| 26 |
|
| 27 |
| Category | Technology |
|
|
@@ -47,26 +47,34 @@ See `.env.example` for per-provider configuration.
|
|
| 47 |
|
| 48 |
| Mode | `AGENT_MODE` | Notes |
|
| 49 |
|------|-------------|-------|
|
| 50 |
-
|
|
| 51 |
-
|
|
| 52 |
|
| 53 |
-
Tool-calling is supported by OpenAI, Anthropic, Google GenAI, Azure OpenAI, Groq, and some Ollama models (`llama3.1`, `qwen2.5`, `mistral-nemo`).
|
| 54 |
|
| 55 |
-
ReAct with
|
| 56 |
|
| 57 |
```dotenv
|
| 58 |
AGENT_MODE=react
|
| 59 |
-
LLM_PROVIDER=
|
| 60 |
-
|
| 61 |
-
OPENAI_MODEL=gpt-4o-mini
|
| 62 |
```
|
| 63 |
|
| 64 |
-
Pipeline with
|
| 65 |
|
| 66 |
```dotenv
|
| 67 |
AGENT_MODE=pipeline
|
| 68 |
LLM_PROVIDER=ollama
|
| 69 |
-
OLLAMA_MODEL=
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
```
|
| 71 |
|
| 72 |
## Quick Start
|
|
@@ -135,7 +143,7 @@ src/
|
|
| 135 |
reranker.py # cross-encoder
|
| 136 |
api/
|
| 137 |
main.py
|
| 138 |
-
routes.py # /query, /ingest, /health
|
| 139 |
agent/
|
| 140 |
intent_classifier.py
|
| 141 |
router.py # pipeline mode (AGENT_MODE=pipeline)
|
|
|
|
| 2 |
|
| 3 |
**Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
|
| 4 |
|
| 5 |
+
A document intelligence system covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a LangGraph ReAct agent (default) for queries that need multiple retrieval steps, and a pipeline for lightweight models without tool-calling support. Retrieval quality is evaluated with RAGAS.
|
| 6 |
|
| 7 |
## How it works
|
| 8 |
|
|
|
|
| 12 |
|
| 13 |
**Two routing modes, switchable via `AGENT_MODE`:**
|
| 14 |
|
| 15 |
+
- **ReAct Agent** (default): a reasoning loop where the LLM calls tools as many times as it needs before answering. Useful for multi-hop questions or comparisons across documents. Requires a model with tool-calling support.
|
|
|
|
|
|
|
| 16 |
|
| 17 |
| Tool | Purpose |
|
| 18 |
|------|---------|
|
|
|
|
| 20 |
| `list_documents()` | See what's in the knowledge base |
|
| 21 |
| `fetch_document(document_id)` | Read a full document |
|
| 22 |
|
| 23 |
+
- **Pipeline** (`AGENT_MODE=pipeline`): a fixed LangGraph graph — language detection → optional translation → hybrid retrieval → reranking → generation. Works with lightweight local models that lack tool-calling support.
|
| 24 |
+
|
| 25 |
## Tech Stack
|
| 26 |
|
| 27 |
| Category | Technology |
|
|
|
|
| 47 |
|
| 48 |
| Mode | `AGENT_MODE` | Notes |
|
| 49 |
|------|-------------|-------|
|
| 50 |
+
| ReAct | `react` (default) | Tool-calling loop, needs a model that supports tool use |
|
| 51 |
+
| Pipeline | `pipeline` | Fixed graph, works with lightweight models that lack tool calling |
|
| 52 |
|
| 53 |
+
Tool-calling is supported by OpenAI, Anthropic, Google GenAI, Azure OpenAI, Groq, and some Ollama models (`gemma4`, `llama3.1`, `qwen2.5`, `mistral-nemo`).
|
| 54 |
|
| 55 |
+
ReAct with local Ollama (default):
|
| 56 |
|
| 57 |
```dotenv
|
| 58 |
AGENT_MODE=react
|
| 59 |
+
LLM_PROVIDER=ollama
|
| 60 |
+
OLLAMA_MODEL=gemma4:e4b
|
|
|
|
| 61 |
```
|
| 62 |
|
| 63 |
+
Pipeline with a lightweight model:
|
| 64 |
|
| 65 |
```dotenv
|
| 66 |
AGENT_MODE=pipeline
|
| 67 |
LLM_PROVIDER=ollama
|
| 68 |
+
OLLAMA_MODEL=gemma3
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
ReAct with OpenAI:
|
| 72 |
+
|
| 73 |
+
```dotenv
|
| 74 |
+
AGENT_MODE=react
|
| 75 |
+
LLM_PROVIDER=openai
|
| 76 |
+
OPENAI_API_KEY=sk-...
|
| 77 |
+
OPENAI_MODEL=gpt-4o-mini
|
| 78 |
```
|
| 79 |
|
| 80 |
## Quick Start
|
|
|
|
| 143 |
reranker.py # cross-encoder
|
| 144 |
api/
|
| 145 |
main.py
|
| 146 |
+
routes.py # /query, /query/stream, /ingest, /health
|
| 147 |
agent/
|
| 148 |
intent_classifier.py
|
| 149 |
router.py # pipeline mode (AGENT_MODE=pipeline)
|
README.md
CHANGED
|
@@ -12,7 +12,7 @@ noindex: true
|
|
| 12 |
|
| 13 |
**Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
|
| 14 |
|
| 15 |
-
A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a
|
| 16 |
|
| 17 |
## How it works
|
| 18 |
|
|
|
|
| 12 |
|
| 13 |
**Live Demo:** [xq-dokumentassistent.hf.space](https://xq-dokumentassistent.hf.space) — hosted on Hugging Face Spaces
|
| 14 |
|
| 15 |
+
A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, hybrid retrieval with reranking, and LLM-generated answers with source citations. The LLM layer is provider-agnostic. Two modes: a pipeline for lightweight models, a LangGraph ReAct agent for queries that need multiple retrieval steps. Retrieval quality is evaluated with RAGAS.
|
| 16 |
|
| 17 |
## How it works
|
| 18 |
|
src/agent/router.py
CHANGED
|
@@ -1,4 +1,14 @@
|
|
| 1 |
-
"""Query router that selects retrieval strategy based on intent.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
import logging
|
| 4 |
import unicodedata
|
|
@@ -443,65 +453,6 @@ class QueryRouter:
|
|
| 443 |
pipeline_details=pipeline,
|
| 444 |
)
|
| 445 |
|
| 446 |
-
# --- Old if/else routing (replaced by LangGraph above) ---
|
| 447 |
-
#
|
| 448 |
-
# user_language, intent = self._detect_language_and_intent(query)
|
| 449 |
-
# retrieval_query = self._translate_query(query, user_language)
|
| 450 |
-
# translated = retrieval_query != query
|
| 451 |
-
#
|
| 452 |
-
# logger.info("Classified intent: %s", intent.value)
|
| 453 |
-
# logger.debug("Intent classification result: %s for query='%s'", intent.value, query)
|
| 454 |
-
#
|
| 455 |
-
# should_retrieve = intent != IntentType.UNKNOWN
|
| 456 |
-
# logger.debug("Retrieval executed: %s (intent=%s)", should_retrieve, intent.value)
|
| 457 |
-
#
|
| 458 |
-
# pipeline = PipelineDetails(
|
| 459 |
-
# original_query=query,
|
| 460 |
-
# retrieval_query=retrieval_query,
|
| 461 |
-
# detected_language=user_language,
|
| 462 |
-
# translated=translated,
|
| 463 |
-
# )
|
| 464 |
-
#
|
| 465 |
-
# if should_retrieve:
|
| 466 |
-
# hybrid_result = self._hybrid_retriever.search_detailed(retrieval_query, top_k=top_k)
|
| 467 |
-
# pipeline.dense_results = hybrid_result.dense_results
|
| 468 |
-
# pipeline.sparse_results = hybrid_result.sparse_results
|
| 469 |
-
# pipeline.fused_results = hybrid_result.fused_results
|
| 470 |
-
# results = hybrid_result.fused_results
|
| 471 |
-
# else:
|
| 472 |
-
# results = []
|
| 473 |
-
#
|
| 474 |
-
# logger.info("Retrieved %d results from hybrid search", len(results))
|
| 475 |
-
# logger.debug("Retrieval returned %d results", len(results))
|
| 476 |
-
#
|
| 477 |
-
# reranked = self._reranker.rerank(retrieval_query, results, top_k=top_k) if results else []
|
| 478 |
-
# pipeline.reranked_results = reranked
|
| 479 |
-
# logger.info("Reranked to %d results", len(reranked))
|
| 480 |
-
#
|
| 481 |
-
# if reranked and intent == IntentType.FACTUAL:
|
| 482 |
-
# intent = IntentType.RAG
|
| 483 |
-
# logger.info("Overriding intent to RAG (sources retrieved)")
|
| 484 |
-
#
|
| 485 |
-
# context = "\n\n".join(r.chunk.text for r in reranked)
|
| 486 |
-
# prompt = self._build_prompt(query, intent, context, user_language)
|
| 487 |
-
#
|
| 488 |
-
# answer = self._llm_chain.invoke(prompt)
|
| 489 |
-
# logger.info("Generated answer for intent=%s", intent.value)
|
| 490 |
-
#
|
| 491 |
-
# if reranked:
|
| 492 |
-
# confidence = max(r.score for r in reranked)
|
| 493 |
-
# logger.info("Confidence: %.4f (sigmoid-normalized by reranker)", confidence)
|
| 494 |
-
# else:
|
| 495 |
-
# confidence = 0.0
|
| 496 |
-
#
|
| 497 |
-
# return GenerationResponse(
|
| 498 |
-
# answer=str(answer),
|
| 499 |
-
# sources=reranked,
|
| 500 |
-
# intent=intent,
|
| 501 |
-
# confidence=confidence,
|
| 502 |
-
# pipeline_details=pipeline,
|
| 503 |
-
# )
|
| 504 |
-
|
| 505 |
def route_stream(self, query: str, top_k: int) -> Generator[dict, None, None]:
|
| 506 |
"""Stream pipeline events as each LangGraph node completes.
|
| 507 |
|
|
|
|
| 1 |
+
"""Query router that selects retrieval strategy based on intent.
|
| 2 |
+
--------------------------------------------------------------------
|
| 3 |
+
This is to support lightweight local models (e.g. gemma3) that lack
|
| 4 |
+
tool/function-calling capability. LangGraph moves all routing decisions
|
| 5 |
+
(intent branching, confidence-based retry) into graph edges so the
|
| 6 |
+
pipeline works identically regardless of the underlying model.
|
| 7 |
+
|
| 8 |
+
This pipeline has a conditional retry loop (low confidence → broaden query → re-retrieve).
|
| 9 |
+
LangGraph makes that cycle, the conditional skip, and per-node streaming
|
| 10 |
+
explicit and testable without hand-rolled flags or callback plumbing.
|
| 11 |
+
"""
|
| 12 |
|
| 13 |
import logging
|
| 14 |
import unicodedata
|
|
|
|
| 453 |
pipeline_details=pipeline,
|
| 454 |
)
|
| 455 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 456 |
def route_stream(self, query: str, top_k: int) -> Generator[dict, None, None]:
|
| 457 |
"""Stream pipeline events as each LangGraph node completes.
|
| 458 |
|
src/ui/app.py
CHANGED
|
@@ -66,8 +66,8 @@ TEXTS: dict[str, dict[str, str]] = {
|
|
| 66 |
"Et dokumentintelligens-system bygget på en RAG-arkitektur, dækkende PDF-indlæsning, semantisk chunking, "
|
| 67 |
"hybrid søgning med reranking "
|
| 68 |
"og LLM-genererede svar med kildehenvisninger. LLM-laget er provider-agnostisk. "
|
| 69 |
-
"To tilstande: en
|
| 70 |
-
"
|
| 71 |
),
|
| 72 |
"search_label": "Stil et spørgsmål om ... ",
|
| 73 |
"search_placeholder": "F.eks.: Hvad er reglerne for behandling af personoplysninger?",
|
|
@@ -144,8 +144,8 @@ TEXTS: dict[str, dict[str, str]] = {
|
|
| 144 |
"A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, "
|
| 145 |
"hybrid retrieval with reranking, "
|
| 146 |
"and LLM-generated answers with source citations. The LLM layer is provider-agnostic. "
|
| 147 |
-
"Two modes: a
|
| 148 |
-
"
|
| 149 |
"Retrieval quality is evaluated with RAGAS."
|
| 150 |
),
|
| 151 |
"search_label": "Ask a question ...",
|
|
|
|
| 66 |
"Et dokumentintelligens-system bygget på en RAG-arkitektur, dækkende PDF-indlæsning, semantisk chunking, "
|
| 67 |
"hybrid søgning med reranking "
|
| 68 |
"og LLM-genererede svar med kildehenvisninger. LLM-laget er provider-agnostisk. "
|
| 69 |
+
"To tilstande: en LangGraph ReAct-agent (standard) til forespørgsler der kræver flere søgetrin, "
|
| 70 |
+
"og en pipeline til lette modeller uden værktøjskald. Søgekvaliteten evalueres med RAGAS."
|
| 71 |
),
|
| 72 |
"search_label": "Stil et spørgsmål om ... ",
|
| 73 |
"search_placeholder": "F.eks.: Hvad er reglerne for behandling af personoplysninger?",
|
|
|
|
| 144 |
"A document intelligence system built on a RAG architecture, covering PDF ingestion, semantic chunking, "
|
| 145 |
"hybrid retrieval with reranking, "
|
| 146 |
"and LLM-generated answers with source citations. The LLM layer is provider-agnostic. "
|
| 147 |
+
"Two modes: a LangGraph ReAct agent (default) for queries that need multiple retrieval steps, "
|
| 148 |
+
"and a pipeline for lightweight models without tool-calling support. "
|
| 149 |
"Retrieval quality is evaluated with RAGAS."
|
| 150 |
),
|
| 151 |
"search_label": "Ask a question ...",
|