Spaces:

XQ
/

Dokumentassistent

Sleeping

App Files Files

XQ commited on Apr 5

Commit

b205d63

1 Parent(s): b3c968a

Change default LLM

Browse files

Files changed (4) hide show

README.md +10 -9
src/config.py +10 -2
src/provider.py +11 -1
src/ui/app.py +2 -8

README.md CHANGED Viewed

@@ -24,9 +24,9 @@ The system follows a three-stage RAG pipeline with an optional Agent Flows mode:
 **Routing — two modes (switchable via `AGENT_MODE`):**
-- **Pipeline mode** (default, `AGENT_MODE=pipeline`): Fixed LangGraph DAG — language detection → optional translation → hybrid retrieval → cross-encoder reranking → intent-specific generation. Robust on any LLM including local Ollama models.
-- **ReAct Agent mode** (`AGENT_MODE=react`): Replaces the fixed DAG with a multi-step reasoning loop. The LLM decides which tools to call and how many times, then produces a grounded answer citing source documents. Supports multi-hop questions, comparisons across documents, and procedural queries that benefit from iterative retrieval. Requires an LLM with tool-calling support (OpenAI, Anthropic, Google GenAI, or compatible Ollama models such as `llama3.1` / `qwen2.5`).
   Available tools in ReAct mode:
@@ -44,7 +44,7 @@ The system follows a three-stage RAG pipeline with an optional Agent Flows mode:
 | Orchestration | LangChain, LangGraph |
 | Vector Store | Qdrant (local mode, no server required) |
 | Embedding | HuggingFace `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
-| LLM | `gemma3:4b` (default, runs locally via Ollama) |
 | Sparse Search | rank_bm25 |
 | Reranking | sentence-transformers `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
 | PDF Parsing | PyMuPDF (fitz) |
@@ -72,7 +72,7 @@ The system supports two routing modes, controlled by `AGENT_MODE` in `.env`:
 | Mode | Value | Description |
 |------|-------|-------------|
-| Pipeline (default) | `AGENT_MODE=pipeline` | Fixed LangGraph DAG. Works with any LLM including local Ollama models such as `gemma3:4b`. |
 | ReAct Agent | `AGENT_MODE=react` | Multi-step reasoning loop. The LLM calls tools as many times as needed — `hybrid_search` for targeted passages, `list_documents` to navigate the knowledge base, `fetch_document` for full document reads — then cites sources in the final answer. |
 **LLM compatibility for ReAct mode:**
@@ -85,8 +85,9 @@ The system supports two routing modes, controlled by `AGENT_MODE` in `.env`:
 | Anthropic (`claude-*`) | Yes |
 | Google GenAI (`gemini-*`) | Yes |
 | Azure OpenAI | Yes |
 | Ollama — `llama3.1`, `qwen2.5`, `mistral-nemo` | Yes (model-dependent) |
-| Ollama — `gemma3:4b` (default) | No → use `pipeline` mode |
 Example `.env` for ReAct mode with OpenAI:
@@ -102,7 +103,7 @@ Example `.env` for pipeline mode with local Ollama (default, no API key needed):
 ```dotenv
 AGENT_MODE=pipeline
 LLM_PROVIDER=ollama
-OLLAMA_MODEL=gemma3:4b
 ```
 ## Quick Start
@@ -121,7 +122,7 @@ pip install -r requirements.txt
 cp .env.example .env
 # Pull the default LLM
-ollama pull gemma3:4b
 # Ingest documents (place PDFs in docs/ first)
 python -m scripts.ingest
@@ -146,7 +147,7 @@ cp .env.example .env
 docker compose --profile local up --build
 ```
-Starts Qdrant + Ollama + API + UI. The `ollama-init` sidecar pulls `gemma3:4b` on first run.
 | Service | URL |
 |---|---|
@@ -155,7 +156,7 @@ Starts Qdrant + Ollama + API + UI. The `ollama-init` sidecar pulls `gemma3:4b` o
 | Streamlit UI | http://localhost:8501 |
 | Qdrant dashboard | http://localhost:6333/dashboard |
-### Cloud mode (OpenAI / Azure / Anthropic / Google)
 ```bash
 cp .env.example .env

 **Routing — two modes (switchable via `AGENT_MODE`):**
+- **Pipeline mode** (default, `AGENT_MODE=pipeline`): Fixed LangGraph DAG — language detection → optional translation → hybrid retrieval → cross-encoder reranking → intent-specific generation. Works with lightweight models such as `gemma4`.
+- **ReAct Agent mode** (`AGENT_MODE=react`): Replaces the fixed DAG with a multi-step reasoning loop. The LLM decides which tools to call and how many times, then produces a grounded answer citing source documents. Supports multi-hop questions, comparisons across documents, and procedural queries that benefit from iterative retrieval. Requires an LLM with tool-calling support (OpenAI, Anthropic, Google GenAI, or compatible Ollama models such as `llama3.1` / `qwen3`).
   Available tools in ReAct mode:
 | Orchestration | LangChain, LangGraph |
 | Vector Store | Qdrant (local mode, no server required) |
 | Embedding | HuggingFace `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
+| LLM | `gemma4` (default, runs locally via Ollama) |
 | Sparse Search | rank_bm25 |
 | Reranking | sentence-transformers `cross-encoder/mmarco-mMiniLMv2-L12-H384-v1` |
 | PDF Parsing | PyMuPDF (fitz) |
 | Mode | Value | Description |
 |------|-------|-------------|
+| Pipeline (default) | `AGENT_MODE=pipeline` | Fixed LangGraph DAG. Works with lightweight models such as `gemma4` via Ollama — no cloud API required. |
 | ReAct Agent | `AGENT_MODE=react` | Multi-step reasoning loop. The LLM calls tools as many times as needed — `hybrid_search` for targeted passages, `list_documents` to navigate the knowledge base, `fetch_document` for full document reads — then cites sources in the final answer. |
 **LLM compatibility for ReAct mode:**
 | Anthropic (`claude-*`) | Yes |
 | Google GenAI (`gemini-*`) | Yes |
 | Azure OpenAI | Yes |
+| Groq (`qwen/qwen3-32b`, `llama-3.3-70b-versatile`) | Yes |
 | Ollama — `llama3.1`, `qwen2.5`, `mistral-nemo` | Yes (model-dependent) |
+| Ollama — `gemma4` (default) | No → use `pipeline` mode |
 Example `.env` for ReAct mode with OpenAI:
 ```dotenv
 AGENT_MODE=pipeline
 LLM_PROVIDER=ollama
+OLLAMA_MODEL=gemma4
 ```
 ## Quick Start
 cp .env.example .env
 # Pull the default LLM
+ollama pull gemma4:e4b
 # Ingest documents (place PDFs in docs/ first)
 python -m scripts.ingest
 docker compose --profile local up --build
 ```
+Starts Qdrant + Ollama + API + UI. The `ollama-init` sidecar pulls `gemma4` on first run.
 | Service | URL |
 |---|---|
 | Streamlit UI | http://localhost:8501 |
 | Qdrant dashboard | http://localhost:6333/dashboard |
+### Cloud mode (OpenAI / Azure / Anthropic / Google / Groq)
 ```bash
 cp .env.example .env

src/config.py CHANGED Viewed

@@ -49,6 +49,10 @@ class Settings:
     azure_openai_deployment: str
     azure_openai_embedding_deployment: str
     # Anthropic
     anthropic_api_key: str
     anthropic_model: str
@@ -100,7 +104,7 @@ def load_settings() -> Settings:
         collection_name=os.environ.get("COLLECTION_NAME", "ku_documents"),
         embedding_model=os.environ.get("EMBEDDING_MODEL", "paraphrase-multilingual-MiniLM-L12-v2"),
         embedding_dimension=int(os.environ.get("EMBEDDING_DIMENSION", "384")),
-        generation_model=os.environ.get("GENERATION_MODEL", "gemma3:4b"),
         reranker_model=os.environ.get("RERANKER_MODEL", "cross-encoder/mmarco-mMiniLMv2-L12-H384-v1"),
         chunk_size=int(os.environ.get("CHUNK_SIZE", "512")),
         chunk_overlap=int(os.environ.get("CHUNK_OVERLAP", "64")),
@@ -111,7 +115,7 @@ def load_settings() -> Settings:
         # Ollama
         ollama_base_url=os.environ.get("OLLAMA_BASE_URL", "http://localhost:11434"),
-        ollama_model=os.environ.get("OLLAMA_MODEL", "gemma3:4b"),
         # OpenAI
         openai_api_key=os.environ.get("OPENAI_API_KEY", ""),
@@ -125,6 +129,10 @@ def load_settings() -> Settings:
         azure_openai_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT", ""),
         azure_openai_embedding_deployment=os.environ.get("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", ""),
         # Anthropic
         anthropic_api_key=os.environ.get("ANTHROPIC_API_KEY", ""),
         anthropic_model=os.environ.get("ANTHROPIC_MODEL", "claude-sonnet-4-20250514"),

     azure_openai_deployment: str
     azure_openai_embedding_deployment: str
+    # Groq
+    groq_api_key: str
+    groq_model: str
     # Anthropic
     anthropic_api_key: str
     anthropic_model: str
         collection_name=os.environ.get("COLLECTION_NAME", "ku_documents"),
         embedding_model=os.environ.get("EMBEDDING_MODEL", "paraphrase-multilingual-MiniLM-L12-v2"),
         embedding_dimension=int(os.environ.get("EMBEDDING_DIMENSION", "384")),
+        generation_model=os.environ.get("GENERATION_MODEL", "gemma4:e4b"),
         reranker_model=os.environ.get("RERANKER_MODEL", "cross-encoder/mmarco-mMiniLMv2-L12-H384-v1"),
         chunk_size=int(os.environ.get("CHUNK_SIZE", "512")),
         chunk_overlap=int(os.environ.get("CHUNK_OVERLAP", "64")),
         # Ollama
         ollama_base_url=os.environ.get("OLLAMA_BASE_URL", "http://localhost:11434"),
+        ollama_model=os.environ.get("OLLAMA_MODEL", "gemma4:e4b"),
         # OpenAI
         openai_api_key=os.environ.get("OPENAI_API_KEY", ""),
         azure_openai_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT", ""),
         azure_openai_embedding_deployment=os.environ.get("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", ""),
+        # Groq
+        groq_api_key=os.environ.get("GROQ_API_KEY", ""),
+        groq_model=os.environ.get("GROQ_MODEL", "qwen/qwen3-32b"),
         # Anthropic
         anthropic_api_key=os.environ.get("ANTHROPIC_API_KEY", ""),
         anthropic_model=os.environ.get("ANTHROPIC_MODEL", "claude-sonnet-4-20250514"),

src/provider.py CHANGED Viewed

@@ -13,7 +13,7 @@ from src.config import Settings
 logger = logging.getLogger(__name__)
-_SUPPORTED_LLM_PROVIDERS = ["ollama", "azure_openai", "openai", "anthropic", "google_genai"]
 _SUPPORTED_EMBEDDING_PROVIDERS = ["local", "azure_openai", "openai", "google_genai"]
@@ -62,6 +62,16 @@ def create_llm(settings: Settings) -> BaseChatModel:
                 temperature=0.0,
             )
         case "anthropic":
             from langchain_anthropic import ChatAnthropic

 logger = logging.getLogger(__name__)
+_SUPPORTED_LLM_PROVIDERS = ["ollama", "azure_openai", "openai", "groq", "anthropic", "google_genai"]
 _SUPPORTED_EMBEDDING_PROVIDERS = ["local", "azure_openai", "openai", "google_genai"]
                 temperature=0.0,
             )
+        case "groq":
+            from langchain_openai import ChatOpenAI
+            return ChatOpenAI(
+                model=settings.groq_model,
+                api_key=settings.groq_api_key,
+                base_url="https://api.groq.com/openai/v1",
+                temperature=0.0,
+            )
         case "anthropic":
             from langchain_anthropic import ChatAnthropic

src/ui/app.py CHANGED Viewed

@@ -54,10 +54,7 @@ TEXTS: dict[str, dict[str, str]] = {
             "- **LLM-integration** — provider-agnostisk, prompt-styret "
             "svargenerering\n"
             "- **Evaluering** — RAGAS-baseret kvalitetsmåling\n"
-            "- **Agent Flows** — valgfri ReAct-loop med værktøjskald: "
-            "LLM bestemmer selv hvor mange søgninger der behøves og "
-            "støtter flertrinræsonnering på tværs af dokumenter "
-            "(`AGENT_MODE=react`)"
         ),
         "chunking_label": "Chunking-strategi",
         "chunking_help": "Vælg hvordan dokumenterne opdeles i tekststykker.",
@@ -131,10 +128,7 @@ TEXTS: dict[str, dict[str, str]] = {
             "- **LLM integration** — provider-agnostic, prompt-driven "
             "answer generation\n"
             "- **Evaluation** — RAGAS-based quality measurement\n"
-            "- **Agent Flows** — optional ReAct loop with tool calling: "
-            "the LLM decides how many searches are needed and supports "
-            "multi-step reasoning across documents "
-            "(`AGENT_MODE=react`)"
         ),
         "chunking_label": "Chunking strategy",
         "chunking_help": "Choose how documents are split into text chunks.",

             "- **LLM-integration** — provider-agnostisk, prompt-styret "
             "svargenerering\n"
             "- **Evaluering** — RAGAS-baseret kvalitetsmåling\n"
+            "- **Agent Flows** — ReAct-loop med værktøjskald."
         ),
         "chunking_label": "Chunking-strategi",
         "chunking_help": "Vælg hvordan dokumenterne opdeles i tekststykker.",
             "- **LLM integration** — provider-agnostic, prompt-driven "
             "answer generation\n"
             "- **Evaluation** — RAGAS-based quality measurement\n"
+            "- **Agent Flows** — ReAct loop with tool calling"
         ),
         "chunking_label": "Chunking strategy",
         "chunking_help": "Choose how documents are split into text chunks.",