XQ commited on
Commit
b205d63
·
1 Parent(s): b3c968a

Change default LLM

Browse files
Files changed (4) hide show
  1. README.md +10 -9
  2. src/config.py +10 -2
  3. src/provider.py +11 -1
  4. src/ui/app.py +2 -8
README.md CHANGED
@@ -24,9 +24,9 @@ The system follows a three-stage RAG pipeline with an optional Agent Flows mode:
24
 
25
  **Routing — two modes (switchable via `AGENT_MODE`):**
26
 
27
- - **Pipeline mode** (default, `AGENT_MODE=pipeline`): Fixed LangGraph DAG — language detection → optional translation → hybrid retrieval → cross-encoder reranking → intent-specific generation. Robust on any LLM including local Ollama models.
28
 
29
- - **ReAct Agent mode** (`AGENT_MODE=react`): Replaces the fixed DAG with a multi-step reasoning loop. The LLM decides which tools to call and how many times, then produces a grounded answer citing source documents. Supports multi-hop questions, comparisons across documents, and procedural queries that benefit from iterative retrieval. Requires an LLM with tool-calling support (OpenAI, Anthropic, Google GenAI, or compatible Ollama models such as `llama3.1` / `qwen2.5`).
30
 
31
  Available tools in ReAct mode:
32
 
@@ -44,7 +44,7 @@ The system follows a three-stage RAG pipeline with an optional Agent Flows mode:
44
  | Orchestration | LangChain, LangGraph |
45
  | Vector Store | Qdrant (local mode, no server required) |
46
  | Embedding | HuggingFace `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
47
- | LLM | `gemma3:4b` (default, runs locally via Ollama) |
48
  | Sparse Search | rank_bm25 |
49
  | Reranking | sentence-transformers `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
50
  | PDF Parsing | PyMuPDF (fitz) |
@@ -72,7 +72,7 @@ The system supports two routing modes, controlled by `AGENT_MODE` in `.env`:
72
 
73
  | Mode | Value | Description |
74
  |------|-------|-------------|
75
- | Pipeline (default) | `AGENT_MODE=pipeline` | Fixed LangGraph DAG. Works with any LLM including local Ollama models such as `gemma3:4b`. |
76
  | ReAct Agent | `AGENT_MODE=react` | Multi-step reasoning loop. The LLM calls tools as many times as needed — `hybrid_search` for targeted passages, `list_documents` to navigate the knowledge base, `fetch_document` for full document reads — then cites sources in the final answer. |
77
 
78
  **LLM compatibility for ReAct mode:**
@@ -85,8 +85,9 @@ The system supports two routing modes, controlled by `AGENT_MODE` in `.env`:
85
  | Anthropic (`claude-*`) | Yes |
86
  | Google GenAI (`gemini-*`) | Yes |
87
  | Azure OpenAI | Yes |
 
88
  | Ollama — `llama3.1`, `qwen2.5`, `mistral-nemo` | Yes (model-dependent) |
89
- | Ollama — `gemma3:4b` (default) | No → use `pipeline` mode |
90
 
91
  Example `.env` for ReAct mode with OpenAI:
92
 
@@ -102,7 +103,7 @@ Example `.env` for pipeline mode with local Ollama (default, no API key needed):
102
  ```dotenv
103
  AGENT_MODE=pipeline
104
  LLM_PROVIDER=ollama
105
- OLLAMA_MODEL=gemma3:4b
106
  ```
107
 
108
  ## Quick Start
@@ -121,7 +122,7 @@ pip install -r requirements.txt
121
  cp .env.example .env
122
 
123
  # Pull the default LLM
124
- ollama pull gemma3:4b
125
 
126
  # Ingest documents (place PDFs in docs/ first)
127
  python -m scripts.ingest
@@ -146,7 +147,7 @@ cp .env.example .env
146
  docker compose --profile local up --build
147
  ```
148
 
149
- Starts Qdrant + Ollama + API + UI. The `ollama-init` sidecar pulls `gemma3:4b` on first run.
150
 
151
  | Service | URL |
152
  |---|---|
@@ -155,7 +156,7 @@ Starts Qdrant + Ollama + API + UI. The `ollama-init` sidecar pulls `gemma3:4b` o
155
  | Streamlit UI | http://localhost:8501 |
156
  | Qdrant dashboard | http://localhost:6333/dashboard |
157
 
158
- ### Cloud mode (OpenAI / Azure / Anthropic / Google)
159
 
160
  ```bash
161
  cp .env.example .env
 
24
 
25
  **Routing — two modes (switchable via `AGENT_MODE`):**
26
 
27
+ - **Pipeline mode** (default, `AGENT_MODE=pipeline`): Fixed LangGraph DAG — language detection → optional translation → hybrid retrieval → cross-encoder reranking → intent-specific generation. Works with lightweight models such as `gemma4`.
28
 
29
+ - **ReAct Agent mode** (`AGENT_MODE=react`): Replaces the fixed DAG with a multi-step reasoning loop. The LLM decides which tools to call and how many times, then produces a grounded answer citing source documents. Supports multi-hop questions, comparisons across documents, and procedural queries that benefit from iterative retrieval. Requires an LLM with tool-calling support (OpenAI, Anthropic, Google GenAI, or compatible Ollama models such as `llama3.1` / `qwen3`).
30
 
31
  Available tools in ReAct mode:
32
 
 
44
  | Orchestration | LangChain, LangGraph |
45
  | Vector Store | Qdrant (local mode, no server required) |
46
  | Embedding | HuggingFace `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
47
+ | LLM | `gemma4` (default, runs locally via Ollama) |
48
  | Sparse Search | rank_bm25 |
49
  | Reranking | sentence-transformers `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
50
  | PDF Parsing | PyMuPDF (fitz) |
 
72
 
73
  | Mode | Value | Description |
74
  |------|-------|-------------|
75
+ | Pipeline (default) | `AGENT_MODE=pipeline` | Fixed LangGraph DAG. Works with lightweight models such as `gemma4` via Ollama — no cloud API required. |
76
  | ReAct Agent | `AGENT_MODE=react` | Multi-step reasoning loop. The LLM calls tools as many times as needed — `hybrid_search` for targeted passages, `list_documents` to navigate the knowledge base, `fetch_document` for full document reads — then cites sources in the final answer. |
77
 
78
  **LLM compatibility for ReAct mode:**
 
85
  | Anthropic (`claude-*`) | Yes |
86
  | Google GenAI (`gemini-*`) | Yes |
87
  | Azure OpenAI | Yes |
88
+ | Groq (`qwen/qwen3-32b`, `llama-3.3-70b-versatile`) | Yes |
89
  | Ollama — `llama3.1`, `qwen2.5`, `mistral-nemo` | Yes (model-dependent) |
90
+ | Ollama — `gemma4` (default) | No → use `pipeline` mode |
91
 
92
  Example `.env` for ReAct mode with OpenAI:
93
 
 
103
  ```dotenv
104
  AGENT_MODE=pipeline
105
  LLM_PROVIDER=ollama
106
+ OLLAMA_MODEL=gemma4
107
  ```
108
 
109
  ## Quick Start
 
122
  cp .env.example .env
123
 
124
  # Pull the default LLM
125
+ ollama pull gemma4:e4b
126
 
127
  # Ingest documents (place PDFs in docs/ first)
128
  python -m scripts.ingest
 
147
  docker compose --profile local up --build
148
  ```
149
 
150
+ Starts Qdrant + Ollama + API + UI. The `ollama-init` sidecar pulls `gemma4` on first run.
151
 
152
  | Service | URL |
153
  |---|---|
 
156
  | Streamlit UI | http://localhost:8501 |
157
  | Qdrant dashboard | http://localhost:6333/dashboard |
158
 
159
+ ### Cloud mode (OpenAI / Azure / Anthropic / Google / Groq)
160
 
161
  ```bash
162
  cp .env.example .env
src/config.py CHANGED
@@ -49,6 +49,10 @@ class Settings:
49
  azure_openai_deployment: str
50
  azure_openai_embedding_deployment: str
51
 
 
 
 
 
52
  # Anthropic
53
  anthropic_api_key: str
54
  anthropic_model: str
@@ -100,7 +104,7 @@ def load_settings() -> Settings:
100
  collection_name=os.environ.get("COLLECTION_NAME", "ku_documents"),
101
  embedding_model=os.environ.get("EMBEDDING_MODEL", "paraphrase-multilingual-MiniLM-L12-v2"),
102
  embedding_dimension=int(os.environ.get("EMBEDDING_DIMENSION", "384")),
103
- generation_model=os.environ.get("GENERATION_MODEL", "gemma3:4b"),
104
  reranker_model=os.environ.get("RERANKER_MODEL", "cross-encoder/mmarco-mMiniLMv2-L12-H384-v1"),
105
  chunk_size=int(os.environ.get("CHUNK_SIZE", "512")),
106
  chunk_overlap=int(os.environ.get("CHUNK_OVERLAP", "64")),
@@ -111,7 +115,7 @@ def load_settings() -> Settings:
111
 
112
  # Ollama
113
  ollama_base_url=os.environ.get("OLLAMA_BASE_URL", "http://localhost:11434"),
114
- ollama_model=os.environ.get("OLLAMA_MODEL", "gemma3:4b"),
115
 
116
  # OpenAI
117
  openai_api_key=os.environ.get("OPENAI_API_KEY", ""),
@@ -125,6 +129,10 @@ def load_settings() -> Settings:
125
  azure_openai_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT", ""),
126
  azure_openai_embedding_deployment=os.environ.get("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", ""),
127
 
 
 
 
 
128
  # Anthropic
129
  anthropic_api_key=os.environ.get("ANTHROPIC_API_KEY", ""),
130
  anthropic_model=os.environ.get("ANTHROPIC_MODEL", "claude-sonnet-4-20250514"),
 
49
  azure_openai_deployment: str
50
  azure_openai_embedding_deployment: str
51
 
52
+ # Groq
53
+ groq_api_key: str
54
+ groq_model: str
55
+
56
  # Anthropic
57
  anthropic_api_key: str
58
  anthropic_model: str
 
104
  collection_name=os.environ.get("COLLECTION_NAME", "ku_documents"),
105
  embedding_model=os.environ.get("EMBEDDING_MODEL", "paraphrase-multilingual-MiniLM-L12-v2"),
106
  embedding_dimension=int(os.environ.get("EMBEDDING_DIMENSION", "384")),
107
+ generation_model=os.environ.get("GENERATION_MODEL", "gemma4:e4b"),
108
  reranker_model=os.environ.get("RERANKER_MODEL", "cross-encoder/mmarco-mMiniLMv2-L12-H384-v1"),
109
  chunk_size=int(os.environ.get("CHUNK_SIZE", "512")),
110
  chunk_overlap=int(os.environ.get("CHUNK_OVERLAP", "64")),
 
115
 
116
  # Ollama
117
  ollama_base_url=os.environ.get("OLLAMA_BASE_URL", "http://localhost:11434"),
118
+ ollama_model=os.environ.get("OLLAMA_MODEL", "gemma4:e4b"),
119
 
120
  # OpenAI
121
  openai_api_key=os.environ.get("OPENAI_API_KEY", ""),
 
129
  azure_openai_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT", ""),
130
  azure_openai_embedding_deployment=os.environ.get("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", ""),
131
 
132
+ # Groq
133
+ groq_api_key=os.environ.get("GROQ_API_KEY", ""),
134
+ groq_model=os.environ.get("GROQ_MODEL", "qwen/qwen3-32b"),
135
+
136
  # Anthropic
137
  anthropic_api_key=os.environ.get("ANTHROPIC_API_KEY", ""),
138
  anthropic_model=os.environ.get("ANTHROPIC_MODEL", "claude-sonnet-4-20250514"),
src/provider.py CHANGED
@@ -13,7 +13,7 @@ from src.config import Settings
13
 
14
  logger = logging.getLogger(__name__)
15
 
16
- _SUPPORTED_LLM_PROVIDERS = ["ollama", "azure_openai", "openai", "anthropic", "google_genai"]
17
  _SUPPORTED_EMBEDDING_PROVIDERS = ["local", "azure_openai", "openai", "google_genai"]
18
 
19
 
@@ -62,6 +62,16 @@ def create_llm(settings: Settings) -> BaseChatModel:
62
  temperature=0.0,
63
  )
64
 
 
 
 
 
 
 
 
 
 
 
65
  case "anthropic":
66
  from langchain_anthropic import ChatAnthropic
67
 
 
13
 
14
  logger = logging.getLogger(__name__)
15
 
16
+ _SUPPORTED_LLM_PROVIDERS = ["ollama", "azure_openai", "openai", "groq", "anthropic", "google_genai"]
17
  _SUPPORTED_EMBEDDING_PROVIDERS = ["local", "azure_openai", "openai", "google_genai"]
18
 
19
 
 
62
  temperature=0.0,
63
  )
64
 
65
+ case "groq":
66
+ from langchain_openai import ChatOpenAI
67
+
68
+ return ChatOpenAI(
69
+ model=settings.groq_model,
70
+ api_key=settings.groq_api_key,
71
+ base_url="https://api.groq.com/openai/v1",
72
+ temperature=0.0,
73
+ )
74
+
75
  case "anthropic":
76
  from langchain_anthropic import ChatAnthropic
77
 
src/ui/app.py CHANGED
@@ -54,10 +54,7 @@ TEXTS: dict[str, dict[str, str]] = {
54
  "- **LLM-integration** — provider-agnostisk, prompt-styret "
55
  "svargenerering\n"
56
  "- **Evaluering** — RAGAS-baseret kvalitetsmåling\n"
57
- "- **Agent Flows** — valgfri ReAct-loop med værktøjskald: "
58
- "LLM bestemmer selv hvor mange søgninger der behøves og "
59
- "støtter flertrinræsonnering på tværs af dokumenter "
60
- "(`AGENT_MODE=react`)"
61
  ),
62
  "chunking_label": "Chunking-strategi",
63
  "chunking_help": "Vælg hvordan dokumenterne opdeles i tekststykker.",
@@ -131,10 +128,7 @@ TEXTS: dict[str, dict[str, str]] = {
131
  "- **LLM integration** — provider-agnostic, prompt-driven "
132
  "answer generation\n"
133
  "- **Evaluation** — RAGAS-based quality measurement\n"
134
- "- **Agent Flows** — optional ReAct loop with tool calling: "
135
- "the LLM decides how many searches are needed and supports "
136
- "multi-step reasoning across documents "
137
- "(`AGENT_MODE=react`)"
138
  ),
139
  "chunking_label": "Chunking strategy",
140
  "chunking_help": "Choose how documents are split into text chunks.",
 
54
  "- **LLM-integration** — provider-agnostisk, prompt-styret "
55
  "svargenerering\n"
56
  "- **Evaluering** — RAGAS-baseret kvalitetsmåling\n"
57
+ "- **Agent Flows** — ReAct-loop med værktøjskald."
 
 
 
58
  ),
59
  "chunking_label": "Chunking-strategi",
60
  "chunking_help": "Vælg hvordan dokumenterne opdeles i tekststykker.",
 
128
  "- **LLM integration** — provider-agnostic, prompt-driven "
129
  "answer generation\n"
130
  "- **Evaluation** — RAGAS-based quality measurement\n"
131
+ "- **Agent Flows** — ReAct loop with tool calling"
 
 
 
132
  ),
133
  "chunking_label": "Chunking strategy",
134
  "chunking_help": "Choose how documents are split into text chunks.",