Spaces:
Running
Running
| # Configuration | |
| ## Environment Variables | |
| All configuration is done via environment variables. Create a `.env` file in the project root. | |
| ### Access Control | |
| | Variable | Default | Description | | |
| |----------|---------|-------------| | |
| | `ACCESS_KEYS` | `''` | Comma-separated list of valid access keys (e.g., `'key1,key2,key3'`) | | |
| | `ACCESS_KEY_TIMEOUT_HOURS` | `24` | Hours to cache validated keys in browser. Set to `0` to require validation on every request | | |
| **Example:** | |
| ```bash | |
| ACCESS_KEYS="my-secret-key-1,my-secret-key-2" | |
| ACCESS_KEY_TIMEOUT_HOURS="24" | |
| ``` | |
| ### AI Model Defaults | |
| Configure default models for different inference types: | |
| | Variable | Default | Description | | |
| |----------|---------|-------------| | |
| | `WEBLLM_DEFAULT_F16_MODEL_ID` | `Qwen3-0.6B-q4f16_1-MLC` | Default WebLLM model with F16 shaders (requires WebGPU) | | |
| | `WEBLLM_DEFAULT_F32_MODEL_ID` | `Qwen3-0.6B-q4f32_1-MLC` | Default WebLLM model with F32 shaders (CPU fallback) | | |
| | `WLLAMA_DEFAULT_MODEL_ID` | `qwen-3-0.6b` | Default Wllama model (CPU-based, no WebGPU required) | | |
| **Model Selection Notes:** | |
| - F16 models are faster but require WebGPU with F16 shader support | |
| - F32 models work on all WebGPU-capable devices | |
| - Wllama models run on CPU via WebAssembly (slower but most compatible) | |
| ### Internal API Configuration | |
| For self-hosted OpenAI-compatible APIs: | |
| | Variable | Default | Description | | |
| |----------|---------|-------------| | |
| | `INTERNAL_OPENAI_COMPATIBLE_API_BASE_URL` | `''` | Base URL of your API (e.g., `https://api.internal.company.com/v1`) | | |
| | `INTERNAL_OPENAI_COMPATIBLE_API_KEY` | `''` | API key for authentication | | |
| | `INTERNAL_OPENAI_COMPATIBLE_API_MODEL` | `''` | Model ID to use (auto-detected if empty) | | |
| | `INTERNAL_OPENAI_COMPATIBLE_API_NAME` | `Internal API` | Display name shown in UI | | |
| **Example:** | |
| ```bash | |
| INTERNAL_OPENAI_COMPATIBLE_API_BASE_URL="https://llm.internal.company.com/v1" | |
| INTERNAL_OPENAI_COMPATIBLE_API_KEY="sk-internal-xxx" | |
| INTERNAL_OPENAI_COMPATIBLE_API_MODEL="llama-3.1-8b" | |
| INTERNAL_OPENAI_COMPATIBLE_API_NAME="Company LLM" | |
| ``` | |
| ### Default Behavior | |
| | Variable | Default | Description | | |
| |----------|---------|-------------| | |
| | `DEFAULT_INFERENCE_TYPE` | `browser` | Default AI inference type (`browser`, `openai`, `horde`, `internal`) | | |
| ## Application Settings | |
| Settings are stored in browser localStorage and can be changed via the Settings UI. | |
| ### Core Settings | |
| | Setting | Type | Default | Description | | |
| |---------|------|---------|-------------| | |
| | `enableAiResponse` | boolean | `false` | Enable AI-generated responses for searches | | |
| | `enableWebGpu` | boolean | `true` | Use WebGPU acceleration when available | | |
| | `enableImageSearch` | boolean | `true` | Include image results in searches | | |
| | `searchResultsToConsider` | number | `3` | Number of top search results to include in AI context | | |
| | `searchResultsLimit` | number | `15` | Maximum search results to fetch | | |
| | `systemPrompt` | string | (template) | Custom system prompt template for AI | | |
| ### Inference Settings | |
| | Setting | Type | Default | Description | | |
| |---------|------|---------|-------------| | |
| | `inferenceType` | enum | `'browser'` | AI provider: `browser`, `openai`, `horde`, `internal` | | |
| | `inferenceTemperature` | number | `0.7` | Sampling temperature (0.0-1.0) | | |
| | `inferenceTopP` | number | `0.9` | Nucleus sampling parameter | | |
| | `inferenceMaxTokens` | number | `4096` | Maximum tokens per generation | | |
| | `inferenceTopK` | number | `40` | Top-K sampling parameter (browser only) | | |
| | `minP` | number | `0.1` | Min-p sampling threshold | | |
| | `repeatPenalty` | number | `1.1` | Penalty for token repetition | | |
| ### Model Selection | |
| **WebLLM Models:** | |
| - Uses MLC LLM model registry | |
| - Models loaded from HuggingFace | |
| - Common options: `Qwen3-0.6B`, `SmolLM2-1.7B`, `Llama-3.2-1B` | |
| **Wllama Models:** | |
| - 40+ pre-configured models | |
| - Range from 135M to 3.8B parameters | |
| - All quantized to Q4_K_S or UD-Q4_K_XL | |
| - Stored at: `Felladrin/gguf-sharded-*` on HuggingFace | |
| **OpenAI/Internal:** | |
| - Any OpenAI-compatible API | |
| - Auto-model detection if not specified | |
| - Supports streaming and reasoning models | |
| **AI Horde:** | |
| - Uses aihorde.net distributed network | |
| - Anonymous or authenticated access | |
| - Parallel generation with race conditions | |
| ### History Settings | |
| | Setting | Type | Default | Description | | |
| |---------|------|---------|-------------| | |
| | `historyRetentionDays` | number | `30` | Days to keep search history | | |
| | `historyMaxEntries` | number | `1000` | Maximum history entries before cleanup | | |
| | `enableHistorySync` | boolean | `true` | Save history to IndexedDB | | |
| ### Privacy Settings | |
| | Setting | Type | Default | Description | | |
| |---------|------|---------|-------------| | |
| | `enableTelemetry` | boolean | `false` | Enable anonymous usage analytics | | |
| | `shareModelDownloads` | boolean | `true` | Share model downloads via WebRTC (peer-to-peer) | | |
| ## Docker Configuration | |
| ### docker-compose.yml (Development) | |
| ```yaml | |
| services: | |
| development-server: | |
| build: | |
| context: . | |
| dockerfile: Dockerfile | |
| ports: | |
| - "7861:7860" # App | |
| - "8888:8888" # SearXNG | |
| environment: | |
| - ACCESS_KEYS=${ACCESS_KEYS:-} | |
| - ACCESS_KEY_TIMEOUT_HOURS=${ACCESS_KEY_TIMEOUT_HOURS:-24} | |
| - WEBLLM_DEFAULT_F16_MODEL_ID=${WEBLLM_DEFAULT_F16_MODEL_ID:-Qwen3-0.6B-q4f16_1-MLC} | |
| # ... more env vars | |
| volumes: | |
| - .:/home/user/app # Live code mounting | |
| - /home/user/app/node_modules | |
| ``` | |
| ### docker-compose.production.yml | |
| Same structure but without volume mounts and with pre-built assets. | |
| ### Dockerfile Environment | |
| The Dockerfile sets up: | |
| 1. **Builder stage**: Compiles `llama-server` from llama.cpp | |
| 2. **Runtime stage**: | |
| - Node.js LTS | |
| - Python 3 + SearXNG | |
| - llama-server binary | |
| **Multi-service container** runs all three concurrently via shell process composition. | |
| ## Vite Environment Injection | |
| Environment variables are injected at build time via `vite.config.ts`: | |
| ```typescript | |
| // Injected into import.meta.env | |
| VITE_SEARCH_TOKEN | |
| VITE_ACCESS_KEYS_ENABLED | |
| VITE_WEBLLM_DEFAULT_F16_MODEL_ID | |
| VITE_WEBLLM_DEFAULT_F32_MODEL_ID | |
| VITE_WLLAMA_DEFAULT_MODEL_ID | |
| VITE_INTERNAL_API_ENABLED | |
| VITE_DEFAULT_INFERENCE_TYPE | |
| ``` | |
| These are accessed in client code as: | |
| ```typescript | |
| const token = import.meta.env.VITE_SEARCH_TOKEN; | |
| ``` | |
| ## Configuration Patterns | |
| ### Scenario: Private Team Instance | |
| ```bash | |
| # .env | |
| ACCESS_KEYS="team-alpha-2024,team-beta-2024" | |
| ACCESS_KEY_TIMEOUT_HOURS="8" | |
| DEFAULT_INFERENCE_TYPE="internal" | |
| INTERNAL_OPENAI_COMPATIBLE_API_BASE_URL="https://llm.company.com/v1" | |
| INTERNAL_OPENAI_COMPATIBLE_API_KEY="sk-xxx" | |
| INTERNAL_OPENAI_COMPATIBLE_API_MODEL="llama-3.1-70b" | |
| ``` | |
| ### Scenario: Public Demo (No AI) | |
| ```bash | |
| # .env - empty, no access keys | |
| # AI disabled by default in settings | |
| ``` | |
| ### Scenario: Browser-Only AI | |
| ```bash | |
| # .env - minimal or empty | |
| # Users choose WebLLM or Wllama in settings | |
| # Models download to user's browser (no server AI) | |
| ``` | |
| ## Debugging Configuration | |
| Enable verbose logging: | |
| ```bash | |
| # In browser console | |
| localStorage.setItem('debug', 'minisearch:*'); | |
| ``` | |
| Check effective configuration: | |
| ```typescript | |
| // In browser console | |
| console.log('Settings:', JSON.parse(localStorage.getItem('settings') || '{}')); | |
| console.log('Env:', import.meta.env); | |
| ``` | |
| ## Related Topics | |
| - **AI Integration**: `docs/ai-integration.md` - Detailed inference type configuration | |
| - **Security**: `docs/security.md` - Access control and privacy details | |
| - **Deployment**: `docs/overview.md` - Container architecture and production setup | |