Spaces:
Running
Running
File size: 7,475 Bytes
10d1fd4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | # Configuration
## Environment Variables
All configuration is done via environment variables. Create a `.env` file in the project root.
### Access Control
| Variable | Default | Description |
|----------|---------|-------------|
| `ACCESS_KEYS` | `''` | Comma-separated list of valid access keys (e.g., `'key1,key2,key3'`) |
| `ACCESS_KEY_TIMEOUT_HOURS` | `24` | Hours to cache validated keys in browser. Set to `0` to require validation on every request |
**Example:**
```bash
ACCESS_KEYS="my-secret-key-1,my-secret-key-2"
ACCESS_KEY_TIMEOUT_HOURS="24"
```
### AI Model Defaults
Configure default models for different inference types:
| Variable | Default | Description |
|----------|---------|-------------|
| `WEBLLM_DEFAULT_F16_MODEL_ID` | `Qwen3-0.6B-q4f16_1-MLC` | Default WebLLM model with F16 shaders (requires WebGPU) |
| `WEBLLM_DEFAULT_F32_MODEL_ID` | `Qwen3-0.6B-q4f32_1-MLC` | Default WebLLM model with F32 shaders (CPU fallback) |
| `WLLAMA_DEFAULT_MODEL_ID` | `qwen-3-0.6b` | Default Wllama model (CPU-based, no WebGPU required) |
**Model Selection Notes:**
- F16 models are faster but require WebGPU with F16 shader support
- F32 models work on all WebGPU-capable devices
- Wllama models run on CPU via WebAssembly (slower but most compatible)
### Internal API Configuration
For self-hosted OpenAI-compatible APIs:
| Variable | Default | Description |
|----------|---------|-------------|
| `INTERNAL_OPENAI_COMPATIBLE_API_BASE_URL` | `''` | Base URL of your API (e.g., `https://api.internal.company.com/v1`) |
| `INTERNAL_OPENAI_COMPATIBLE_API_KEY` | `''` | API key for authentication |
| `INTERNAL_OPENAI_COMPATIBLE_API_MODEL` | `''` | Model ID to use (auto-detected if empty) |
| `INTERNAL_OPENAI_COMPATIBLE_API_NAME` | `Internal API` | Display name shown in UI |
**Example:**
```bash
INTERNAL_OPENAI_COMPATIBLE_API_BASE_URL="https://llm.internal.company.com/v1"
INTERNAL_OPENAI_COMPATIBLE_API_KEY="sk-internal-xxx"
INTERNAL_OPENAI_COMPATIBLE_API_MODEL="llama-3.1-8b"
INTERNAL_OPENAI_COMPATIBLE_API_NAME="Company LLM"
```
### Default Behavior
| Variable | Default | Description |
|----------|---------|-------------|
| `DEFAULT_INFERENCE_TYPE` | `browser` | Default AI inference type (`browser`, `openai`, `horde`, `internal`) |
## Application Settings
Settings are stored in browser localStorage and can be changed via the Settings UI.
### Core Settings
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `enableAiResponse` | boolean | `false` | Enable AI-generated responses for searches |
| `enableWebGpu` | boolean | `true` | Use WebGPU acceleration when available |
| `enableImageSearch` | boolean | `true` | Include image results in searches |
| `searchResultsToConsider` | number | `3` | Number of top search results to include in AI context |
| `searchResultsLimit` | number | `15` | Maximum search results to fetch |
| `systemPrompt` | string | (template) | Custom system prompt template for AI |
### Inference Settings
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `inferenceType` | enum | `'browser'` | AI provider: `browser`, `openai`, `horde`, `internal` |
| `inferenceTemperature` | number | `0.7` | Sampling temperature (0.0-1.0) |
| `inferenceTopP` | number | `0.9` | Nucleus sampling parameter |
| `inferenceMaxTokens` | number | `4096` | Maximum tokens per generation |
| `inferenceTopK` | number | `40` | Top-K sampling parameter (browser only) |
| `minP` | number | `0.1` | Min-p sampling threshold |
| `repeatPenalty` | number | `1.1` | Penalty for token repetition |
### Model Selection
**WebLLM Models:**
- Uses MLC LLM model registry
- Models loaded from HuggingFace
- Common options: `Qwen3-0.6B`, `SmolLM2-1.7B`, `Llama-3.2-1B`
**Wllama Models:**
- 40+ pre-configured models
- Range from 135M to 3.8B parameters
- All quantized to Q4_K_S or UD-Q4_K_XL
- Stored at: `Felladrin/gguf-sharded-*` on HuggingFace
**OpenAI/Internal:**
- Any OpenAI-compatible API
- Auto-model detection if not specified
- Supports streaming and reasoning models
**AI Horde:**
- Uses aihorde.net distributed network
- Anonymous or authenticated access
- Parallel generation with race conditions
### History Settings
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `historyRetentionDays` | number | `30` | Days to keep search history |
| `historyMaxEntries` | number | `1000` | Maximum history entries before cleanup |
| `enableHistorySync` | boolean | `true` | Save history to IndexedDB |
### Privacy Settings
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `enableTelemetry` | boolean | `false` | Enable anonymous usage analytics |
| `shareModelDownloads` | boolean | `true` | Share model downloads via WebRTC (peer-to-peer) |
## Docker Configuration
### docker-compose.yml (Development)
```yaml
services:
development-server:
build:
context: .
dockerfile: Dockerfile
ports:
- "7861:7860" # App
- "8888:8888" # SearXNG
environment:
- ACCESS_KEYS=${ACCESS_KEYS:-}
- ACCESS_KEY_TIMEOUT_HOURS=${ACCESS_KEY_TIMEOUT_HOURS:-24}
- WEBLLM_DEFAULT_F16_MODEL_ID=${WEBLLM_DEFAULT_F16_MODEL_ID:-Qwen3-0.6B-q4f16_1-MLC}
# ... more env vars
volumes:
- .:/home/user/app # Live code mounting
- /home/user/app/node_modules
```
### docker-compose.production.yml
Same structure but without volume mounts and with pre-built assets.
### Dockerfile Environment
The Dockerfile sets up:
1. **Builder stage**: Compiles `llama-server` from llama.cpp
2. **Runtime stage**:
- Node.js LTS
- Python 3 + SearXNG
- llama-server binary
**Multi-service container** runs all three concurrently via shell process composition.
## Vite Environment Injection
Environment variables are injected at build time via `vite.config.ts`:
```typescript
// Injected into import.meta.env
VITE_SEARCH_TOKEN
VITE_ACCESS_KEYS_ENABLED
VITE_WEBLLM_DEFAULT_F16_MODEL_ID
VITE_WEBLLM_DEFAULT_F32_MODEL_ID
VITE_WLLAMA_DEFAULT_MODEL_ID
VITE_INTERNAL_API_ENABLED
VITE_DEFAULT_INFERENCE_TYPE
```
These are accessed in client code as:
```typescript
const token = import.meta.env.VITE_SEARCH_TOKEN;
```
## Configuration Patterns
### Scenario: Private Team Instance
```bash
# .env
ACCESS_KEYS="team-alpha-2024,team-beta-2024"
ACCESS_KEY_TIMEOUT_HOURS="8"
DEFAULT_INFERENCE_TYPE="internal"
INTERNAL_OPENAI_COMPATIBLE_API_BASE_URL="https://llm.company.com/v1"
INTERNAL_OPENAI_COMPATIBLE_API_KEY="sk-xxx"
INTERNAL_OPENAI_COMPATIBLE_API_MODEL="llama-3.1-70b"
```
### Scenario: Public Demo (No AI)
```bash
# .env - empty, no access keys
# AI disabled by default in settings
```
### Scenario: Browser-Only AI
```bash
# .env - minimal or empty
# Users choose WebLLM or Wllama in settings
# Models download to user's browser (no server AI)
```
## Debugging Configuration
Enable verbose logging:
```bash
# In browser console
localStorage.setItem('debug', 'minisearch:*');
```
Check effective configuration:
```typescript
// In browser console
console.log('Settings:', JSON.parse(localStorage.getItem('settings') || '{}'));
console.log('Env:', import.meta.env);
```
## Related Topics
- **AI Integration**: `docs/ai-integration.md` - Detailed inference type configuration
- **Security**: `docs/security.md` - Access control and privacy details
- **Deployment**: `docs/overview.md` - Container architecture and production setup
|