Spaces:
Sleeping
Sleeping
XQ commited on
Commit ·
b205d63
1
Parent(s): b3c968a
Change default LLM
Browse files- README.md +10 -9
- src/config.py +10 -2
- src/provider.py +11 -1
- src/ui/app.py +2 -8
README.md
CHANGED
|
@@ -24,9 +24,9 @@ The system follows a three-stage RAG pipeline with an optional Agent Flows mode:
|
|
| 24 |
|
| 25 |
**Routing — two modes (switchable via `AGENT_MODE`):**
|
| 26 |
|
| 27 |
-
- **Pipeline mode** (default, `AGENT_MODE=pipeline`): Fixed LangGraph DAG — language detection → optional translation → hybrid retrieval → cross-encoder reranking → intent-specific generation.
|
| 28 |
|
| 29 |
-
- **ReAct Agent mode** (`AGENT_MODE=react`): Replaces the fixed DAG with a multi-step reasoning loop. The LLM decides which tools to call and how many times, then produces a grounded answer citing source documents. Supports multi-hop questions, comparisons across documents, and procedural queries that benefit from iterative retrieval. Requires an LLM with tool-calling support (OpenAI, Anthropic, Google GenAI, or compatible Ollama models such as `llama3.1` / `
|
| 30 |
|
| 31 |
Available tools in ReAct mode:
|
| 32 |
|
|
@@ -44,7 +44,7 @@ The system follows a three-stage RAG pipeline with an optional Agent Flows mode:
|
|
| 44 |
| Orchestration | LangChain, LangGraph |
|
| 45 |
| Vector Store | Qdrant (local mode, no server required) |
|
| 46 |
| Embedding | HuggingFace `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
|
| 47 |
-
| LLM | `
|
| 48 |
| Sparse Search | rank_bm25 |
|
| 49 |
| Reranking | sentence-transformers `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
|
| 50 |
| PDF Parsing | PyMuPDF (fitz) |
|
|
@@ -72,7 +72,7 @@ The system supports two routing modes, controlled by `AGENT_MODE` in `.env`:
|
|
| 72 |
|
| 73 |
| Mode | Value | Description |
|
| 74 |
|------|-------|-------------|
|
| 75 |
-
| Pipeline (default) | `AGENT_MODE=pipeline` | Fixed LangGraph DAG. Works with
|
| 76 |
| ReAct Agent | `AGENT_MODE=react` | Multi-step reasoning loop. The LLM calls tools as many times as needed — `hybrid_search` for targeted passages, `list_documents` to navigate the knowledge base, `fetch_document` for full document reads — then cites sources in the final answer. |
|
| 77 |
|
| 78 |
**LLM compatibility for ReAct mode:**
|
|
@@ -85,8 +85,9 @@ The system supports two routing modes, controlled by `AGENT_MODE` in `.env`:
|
|
| 85 |
| Anthropic (`claude-*`) | Yes |
|
| 86 |
| Google GenAI (`gemini-*`) | Yes |
|
| 87 |
| Azure OpenAI | Yes |
|
|
|
|
| 88 |
| Ollama — `llama3.1`, `qwen2.5`, `mistral-nemo` | Yes (model-dependent) |
|
| 89 |
-
| Ollama — `
|
| 90 |
|
| 91 |
Example `.env` for ReAct mode with OpenAI:
|
| 92 |
|
|
@@ -102,7 +103,7 @@ Example `.env` for pipeline mode with local Ollama (default, no API key needed):
|
|
| 102 |
```dotenv
|
| 103 |
AGENT_MODE=pipeline
|
| 104 |
LLM_PROVIDER=ollama
|
| 105 |
-
OLLAMA_MODEL=
|
| 106 |
```
|
| 107 |
|
| 108 |
## Quick Start
|
|
@@ -121,7 +122,7 @@ pip install -r requirements.txt
|
|
| 121 |
cp .env.example .env
|
| 122 |
|
| 123 |
# Pull the default LLM
|
| 124 |
-
ollama pull
|
| 125 |
|
| 126 |
# Ingest documents (place PDFs in docs/ first)
|
| 127 |
python -m scripts.ingest
|
|
@@ -146,7 +147,7 @@ cp .env.example .env
|
|
| 146 |
docker compose --profile local up --build
|
| 147 |
```
|
| 148 |
|
| 149 |
-
Starts Qdrant + Ollama + API + UI. The `ollama-init` sidecar pulls `
|
| 150 |
|
| 151 |
| Service | URL |
|
| 152 |
|---|---|
|
|
@@ -155,7 +156,7 @@ Starts Qdrant + Ollama + API + UI. The `ollama-init` sidecar pulls `gemma3:4b` o
|
|
| 155 |
| Streamlit UI | http://localhost:8501 |
|
| 156 |
| Qdrant dashboard | http://localhost:6333/dashboard |
|
| 157 |
|
| 158 |
-
### Cloud mode (OpenAI / Azure / Anthropic / Google)
|
| 159 |
|
| 160 |
```bash
|
| 161 |
cp .env.example .env
|
|
|
|
| 24 |
|
| 25 |
**Routing — two modes (switchable via `AGENT_MODE`):**
|
| 26 |
|
| 27 |
+
- **Pipeline mode** (default, `AGENT_MODE=pipeline`): Fixed LangGraph DAG — language detection → optional translation → hybrid retrieval → cross-encoder reranking → intent-specific generation. Works with lightweight models such as `gemma4`.
|
| 28 |
|
| 29 |
+
- **ReAct Agent mode** (`AGENT_MODE=react`): Replaces the fixed DAG with a multi-step reasoning loop. The LLM decides which tools to call and how many times, then produces a grounded answer citing source documents. Supports multi-hop questions, comparisons across documents, and procedural queries that benefit from iterative retrieval. Requires an LLM with tool-calling support (OpenAI, Anthropic, Google GenAI, or compatible Ollama models such as `llama3.1` / `qwen3`).
|
| 30 |
|
| 31 |
Available tools in ReAct mode:
|
| 32 |
|
|
|
|
| 44 |
| Orchestration | LangChain, LangGraph |
|
| 45 |
| Vector Store | Qdrant (local mode, no server required) |
|
| 46 |
| Embedding | HuggingFace `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
|
| 47 |
+
| LLM | `gemma4` (default, runs locally via Ollama) |
|
| 48 |
| Sparse Search | rank_bm25 |
|
| 49 |
| Reranking | sentence-transformers `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
|
| 50 |
| PDF Parsing | PyMuPDF (fitz) |
|
|
|
|
| 72 |
|
| 73 |
| Mode | Value | Description |
|
| 74 |
|------|-------|-------------|
|
| 75 |
+
| Pipeline (default) | `AGENT_MODE=pipeline` | Fixed LangGraph DAG. Works with lightweight models such as `gemma4` via Ollama — no cloud API required. |
|
| 76 |
| ReAct Agent | `AGENT_MODE=react` | Multi-step reasoning loop. The LLM calls tools as many times as needed — `hybrid_search` for targeted passages, `list_documents` to navigate the knowledge base, `fetch_document` for full document reads — then cites sources in the final answer. |
|
| 77 |
|
| 78 |
**LLM compatibility for ReAct mode:**
|
|
|
|
| 85 |
| Anthropic (`claude-*`) | Yes |
|
| 86 |
| Google GenAI (`gemini-*`) | Yes |
|
| 87 |
| Azure OpenAI | Yes |
|
| 88 |
+
| Groq (`qwen/qwen3-32b`, `llama-3.3-70b-versatile`) | Yes |
|
| 89 |
| Ollama — `llama3.1`, `qwen2.5`, `mistral-nemo` | Yes (model-dependent) |
|
| 90 |
+
| Ollama — `gemma4` (default) | No → use `pipeline` mode |
|
| 91 |
|
| 92 |
Example `.env` for ReAct mode with OpenAI:
|
| 93 |
|
|
|
|
| 103 |
```dotenv
|
| 104 |
AGENT_MODE=pipeline
|
| 105 |
LLM_PROVIDER=ollama
|
| 106 |
+
OLLAMA_MODEL=gemma4
|
| 107 |
```
|
| 108 |
|
| 109 |
## Quick Start
|
|
|
|
| 122 |
cp .env.example .env
|
| 123 |
|
| 124 |
# Pull the default LLM
|
| 125 |
+
ollama pull gemma4:e4b
|
| 126 |
|
| 127 |
# Ingest documents (place PDFs in docs/ first)
|
| 128 |
python -m scripts.ingest
|
|
|
|
| 147 |
docker compose --profile local up --build
|
| 148 |
```
|
| 149 |
|
| 150 |
+
Starts Qdrant + Ollama + API + UI. The `ollama-init` sidecar pulls `gemma4` on first run.
|
| 151 |
|
| 152 |
| Service | URL |
|
| 153 |
|---|---|
|
|
|
|
| 156 |
| Streamlit UI | http://localhost:8501 |
|
| 157 |
| Qdrant dashboard | http://localhost:6333/dashboard |
|
| 158 |
|
| 159 |
+
### Cloud mode (OpenAI / Azure / Anthropic / Google / Groq)
|
| 160 |
|
| 161 |
```bash
|
| 162 |
cp .env.example .env
|
src/config.py
CHANGED
|
@@ -49,6 +49,10 @@ class Settings:
|
|
| 49 |
azure_openai_deployment: str
|
| 50 |
azure_openai_embedding_deployment: str
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
# Anthropic
|
| 53 |
anthropic_api_key: str
|
| 54 |
anthropic_model: str
|
|
@@ -100,7 +104,7 @@ def load_settings() -> Settings:
|
|
| 100 |
collection_name=os.environ.get("COLLECTION_NAME", "ku_documents"),
|
| 101 |
embedding_model=os.environ.get("EMBEDDING_MODEL", "paraphrase-multilingual-MiniLM-L12-v2"),
|
| 102 |
embedding_dimension=int(os.environ.get("EMBEDDING_DIMENSION", "384")),
|
| 103 |
-
generation_model=os.environ.get("GENERATION_MODEL", "
|
| 104 |
reranker_model=os.environ.get("RERANKER_MODEL", "cross-encoder/mmarco-mMiniLMv2-L12-H384-v1"),
|
| 105 |
chunk_size=int(os.environ.get("CHUNK_SIZE", "512")),
|
| 106 |
chunk_overlap=int(os.environ.get("CHUNK_OVERLAP", "64")),
|
|
@@ -111,7 +115,7 @@ def load_settings() -> Settings:
|
|
| 111 |
|
| 112 |
# Ollama
|
| 113 |
ollama_base_url=os.environ.get("OLLAMA_BASE_URL", "http://localhost:11434"),
|
| 114 |
-
ollama_model=os.environ.get("OLLAMA_MODEL", "
|
| 115 |
|
| 116 |
# OpenAI
|
| 117 |
openai_api_key=os.environ.get("OPENAI_API_KEY", ""),
|
|
@@ -125,6 +129,10 @@ def load_settings() -> Settings:
|
|
| 125 |
azure_openai_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT", ""),
|
| 126 |
azure_openai_embedding_deployment=os.environ.get("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", ""),
|
| 127 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
# Anthropic
|
| 129 |
anthropic_api_key=os.environ.get("ANTHROPIC_API_KEY", ""),
|
| 130 |
anthropic_model=os.environ.get("ANTHROPIC_MODEL", "claude-sonnet-4-20250514"),
|
|
|
|
| 49 |
azure_openai_deployment: str
|
| 50 |
azure_openai_embedding_deployment: str
|
| 51 |
|
| 52 |
+
# Groq
|
| 53 |
+
groq_api_key: str
|
| 54 |
+
groq_model: str
|
| 55 |
+
|
| 56 |
# Anthropic
|
| 57 |
anthropic_api_key: str
|
| 58 |
anthropic_model: str
|
|
|
|
| 104 |
collection_name=os.environ.get("COLLECTION_NAME", "ku_documents"),
|
| 105 |
embedding_model=os.environ.get("EMBEDDING_MODEL", "paraphrase-multilingual-MiniLM-L12-v2"),
|
| 106 |
embedding_dimension=int(os.environ.get("EMBEDDING_DIMENSION", "384")),
|
| 107 |
+
generation_model=os.environ.get("GENERATION_MODEL", "gemma4:e4b"),
|
| 108 |
reranker_model=os.environ.get("RERANKER_MODEL", "cross-encoder/mmarco-mMiniLMv2-L12-H384-v1"),
|
| 109 |
chunk_size=int(os.environ.get("CHUNK_SIZE", "512")),
|
| 110 |
chunk_overlap=int(os.environ.get("CHUNK_OVERLAP", "64")),
|
|
|
|
| 115 |
|
| 116 |
# Ollama
|
| 117 |
ollama_base_url=os.environ.get("OLLAMA_BASE_URL", "http://localhost:11434"),
|
| 118 |
+
ollama_model=os.environ.get("OLLAMA_MODEL", "gemma4:e4b"),
|
| 119 |
|
| 120 |
# OpenAI
|
| 121 |
openai_api_key=os.environ.get("OPENAI_API_KEY", ""),
|
|
|
|
| 129 |
azure_openai_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT", ""),
|
| 130 |
azure_openai_embedding_deployment=os.environ.get("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", ""),
|
| 131 |
|
| 132 |
+
# Groq
|
| 133 |
+
groq_api_key=os.environ.get("GROQ_API_KEY", ""),
|
| 134 |
+
groq_model=os.environ.get("GROQ_MODEL", "qwen/qwen3-32b"),
|
| 135 |
+
|
| 136 |
# Anthropic
|
| 137 |
anthropic_api_key=os.environ.get("ANTHROPIC_API_KEY", ""),
|
| 138 |
anthropic_model=os.environ.get("ANTHROPIC_MODEL", "claude-sonnet-4-20250514"),
|
src/provider.py
CHANGED
|
@@ -13,7 +13,7 @@ from src.config import Settings
|
|
| 13 |
|
| 14 |
logger = logging.getLogger(__name__)
|
| 15 |
|
| 16 |
-
_SUPPORTED_LLM_PROVIDERS = ["ollama", "azure_openai", "openai", "anthropic", "google_genai"]
|
| 17 |
_SUPPORTED_EMBEDDING_PROVIDERS = ["local", "azure_openai", "openai", "google_genai"]
|
| 18 |
|
| 19 |
|
|
@@ -62,6 +62,16 @@ def create_llm(settings: Settings) -> BaseChatModel:
|
|
| 62 |
temperature=0.0,
|
| 63 |
)
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
case "anthropic":
|
| 66 |
from langchain_anthropic import ChatAnthropic
|
| 67 |
|
|
|
|
| 13 |
|
| 14 |
logger = logging.getLogger(__name__)
|
| 15 |
|
| 16 |
+
_SUPPORTED_LLM_PROVIDERS = ["ollama", "azure_openai", "openai", "groq", "anthropic", "google_genai"]
|
| 17 |
_SUPPORTED_EMBEDDING_PROVIDERS = ["local", "azure_openai", "openai", "google_genai"]
|
| 18 |
|
| 19 |
|
|
|
|
| 62 |
temperature=0.0,
|
| 63 |
)
|
| 64 |
|
| 65 |
+
case "groq":
|
| 66 |
+
from langchain_openai import ChatOpenAI
|
| 67 |
+
|
| 68 |
+
return ChatOpenAI(
|
| 69 |
+
model=settings.groq_model,
|
| 70 |
+
api_key=settings.groq_api_key,
|
| 71 |
+
base_url="https://api.groq.com/openai/v1",
|
| 72 |
+
temperature=0.0,
|
| 73 |
+
)
|
| 74 |
+
|
| 75 |
case "anthropic":
|
| 76 |
from langchain_anthropic import ChatAnthropic
|
| 77 |
|
src/ui/app.py
CHANGED
|
@@ -54,10 +54,7 @@ TEXTS: dict[str, dict[str, str]] = {
|
|
| 54 |
"- **LLM-integration** — provider-agnostisk, prompt-styret "
|
| 55 |
"svargenerering\n"
|
| 56 |
"- **Evaluering** — RAGAS-baseret kvalitetsmåling\n"
|
| 57 |
-
"- **Agent Flows** —
|
| 58 |
-
"LLM bestemmer selv hvor mange søgninger der behøves og "
|
| 59 |
-
"støtter flertrinræsonnering på tværs af dokumenter "
|
| 60 |
-
"(`AGENT_MODE=react`)"
|
| 61 |
),
|
| 62 |
"chunking_label": "Chunking-strategi",
|
| 63 |
"chunking_help": "Vælg hvordan dokumenterne opdeles i tekststykker.",
|
|
@@ -131,10 +128,7 @@ TEXTS: dict[str, dict[str, str]] = {
|
|
| 131 |
"- **LLM integration** — provider-agnostic, prompt-driven "
|
| 132 |
"answer generation\n"
|
| 133 |
"- **Evaluation** — RAGAS-based quality measurement\n"
|
| 134 |
-
"- **Agent Flows** —
|
| 135 |
-
"the LLM decides how many searches are needed and supports "
|
| 136 |
-
"multi-step reasoning across documents "
|
| 137 |
-
"(`AGENT_MODE=react`)"
|
| 138 |
),
|
| 139 |
"chunking_label": "Chunking strategy",
|
| 140 |
"chunking_help": "Choose how documents are split into text chunks.",
|
|
|
|
| 54 |
"- **LLM-integration** — provider-agnostisk, prompt-styret "
|
| 55 |
"svargenerering\n"
|
| 56 |
"- **Evaluering** — RAGAS-baseret kvalitetsmåling\n"
|
| 57 |
+
"- **Agent Flows** — ReAct-loop med værktøjskald."
|
|
|
|
|
|
|
|
|
|
| 58 |
),
|
| 59 |
"chunking_label": "Chunking-strategi",
|
| 60 |
"chunking_help": "Vælg hvordan dokumenterne opdeles i tekststykker.",
|
|
|
|
| 128 |
"- **LLM integration** — provider-agnostic, prompt-driven "
|
| 129 |
"answer generation\n"
|
| 130 |
"- **Evaluation** — RAGAS-based quality measurement\n"
|
| 131 |
+
"- **Agent Flows** — ReAct loop with tool calling"
|
|
|
|
|
|
|
|
|
|
| 132 |
),
|
| 133 |
"chunking_label": "Chunking strategy",
|
| 134 |
"chunking_help": "Choose how documents are split into text chunks.",
|