# Configuration

## Environment Variables

All configuration is done via environment variables. Create a `.env` file in the project root.

### Access Control

| Variable | Default | Description |
|----------|---------|-------------|
| `ACCESS_KEYS` | `''` | Comma-separated list of valid access keys (e.g., `'key1,key2,key3'`) |
| `ACCESS_KEY_TIMEOUT_HOURS` | `24` | Hours to cache validated keys in browser. Set to `0` to require validation on every request |

**Example:**
```bash
ACCESS_KEYS="my-secret-key-1,my-secret-key-2"
ACCESS_KEY_TIMEOUT_HOURS="24"
```

### AI Model Defaults

Configure default models for different inference types:

| Variable | Default | Description |
|----------|---------|-------------|
| `WEBLLM_DEFAULT_F16_MODEL_ID` | `Qwen3-0.6B-q4f16_1-MLC` | Default WebLLM model with F16 shaders (requires WebGPU) |
| `WEBLLM_DEFAULT_F32_MODEL_ID` | `Qwen3-0.6B-q4f32_1-MLC` | Default WebLLM model with F32 shaders (CPU fallback) |
| `WLLAMA_DEFAULT_MODEL_ID` | `qwen-3-0.6b` | Default Wllama model (CPU-based, no WebGPU required) |

**Model Selection Notes:**
- F16 models are faster but require WebGPU with F16 shader support
- F32 models work on all WebGPU-capable devices
- Wllama models run on CPU via WebAssembly (slower but most compatible)

### Internal API Configuration

For self-hosted OpenAI-compatible APIs:

| Variable | Default | Description |
|----------|---------|-------------|
| `INTERNAL_OPENAI_COMPATIBLE_API_BASE_URL` | `''` | Base URL of your API (e.g., `https://api.internal.company.com/v1`) |
| `INTERNAL_OPENAI_COMPATIBLE_API_KEY` | `''` | API key for authentication |
| `INTERNAL_OPENAI_COMPATIBLE_API_MODEL` | `''` | Model ID to use (auto-detected if empty) |
| `INTERNAL_OPENAI_COMPATIBLE_API_NAME` | `Internal API` | Display name shown in UI |

**Example:**
```bash
INTERNAL_OPENAI_COMPATIBLE_API_BASE_URL="https://llm.internal.company.com/v1"
INTERNAL_OPENAI_COMPATIBLE_API_KEY="sk-internal-xxx"
INTERNAL_OPENAI_COMPATIBLE_API_MODEL="llama-3.1-8b"
INTERNAL_OPENAI_COMPATIBLE_API_NAME="Company LLM"
```

### Default Behavior

| Variable | Default | Description |
|----------|---------|-------------|
| `DEFAULT_INFERENCE_TYPE` | `browser` | Default AI inference type (`browser`, `openai`, `horde`, `internal`) |

## Application Settings

Settings are stored in browser localStorage and can be changed via the Settings UI.

### Core Settings

| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `enableAiResponse` | boolean | `false` | Enable AI-generated responses for searches |
| `enableWebGpu` | boolean | `true` | Use WebGPU acceleration when available |
| `enableImageSearch` | boolean | `true` | Include image results in searches |
| `searchResultsToConsider` | number | `3` | Number of top search results to include in AI context |
| `searchResultsLimit` | number | `15` | Maximum search results to fetch |
| `systemPrompt` | string | (template) | Custom system prompt template for AI |

### Inference Settings

| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `inferenceType` | enum | `'browser'` | AI provider: `browser`, `openai`, `horde`, `internal` |
| `inferenceTemperature` | number | `0.7` | Sampling temperature (0.0-1.0) |
| `inferenceTopP` | number | `0.9` | Nucleus sampling parameter |
| `inferenceMaxTokens` | number | `4096` | Maximum tokens per generation |
| `inferenceTopK` | number | `40` | Top-K sampling parameter (browser only) |
| `minP` | number | `0.1` | Min-p sampling threshold |
| `repeatPenalty` | number | `1.1` | Penalty for token repetition |

### Model Selection

**WebLLM Models:**
- Uses MLC LLM model registry
- Models loaded from HuggingFace
- Common options: `Qwen3-0.6B`, `SmolLM2-1.7B`, `Llama-3.2-1B`

**Wllama Models:**
- 40+ pre-configured models
- Range from 135M to 3.8B parameters
- All quantized to Q4_K_S or UD-Q4_K_XL
- Stored at: `Felladrin/gguf-sharded-*` on HuggingFace

**OpenAI/Internal:**
- Any OpenAI-compatible API
- Auto-model detection if not specified
- Supports streaming and reasoning models

**AI Horde:**
- Uses aihorde.net distributed network
- Anonymous or authenticated access
- Parallel generation with race conditions

### History Settings

| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `historyRetentionDays` | number | `30` | Days to keep search history |
| `historyMaxEntries` | number | `1000` | Maximum history entries before cleanup |
| `enableHistorySync` | boolean | `true` | Save history to IndexedDB |

### Privacy Settings

| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `enableTelemetry` | boolean | `false` | Enable anonymous usage analytics |
| `shareModelDownloads` | boolean | `true` | Share model downloads via WebRTC (peer-to-peer) |

## Docker Configuration

### docker-compose.yml (Development)

```yaml
services:
  development-server:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "7861:7860"  # App
      - "8888:8888"  # SearXNG
    environment:
      - ACCESS_KEYS=${ACCESS_KEYS:-}
      - ACCESS_KEY_TIMEOUT_HOURS=${ACCESS_KEY_TIMEOUT_HOURS:-24}
      - WEBLLM_DEFAULT_F16_MODEL_ID=${WEBLLM_DEFAULT_F16_MODEL_ID:-Qwen3-0.6B-q4f16_1-MLC}
      # ... more env vars
    volumes:
      - .:/home/user/app  # Live code mounting
      - /home/user/app/node_modules
```

### docker-compose.production.yml

Same structure but without volume mounts and with pre-built assets.

### Dockerfile Environment

The Dockerfile sets up:
1. **Builder stage**: Compiles `llama-server` from llama.cpp
2. **Runtime stage**: 
   - Node.js LTS
   - Python 3 + SearXNG
   - llama-server binary

**Multi-service container** runs all three concurrently via shell process composition.

## Vite Environment Injection

Environment variables are injected at build time via `vite.config.ts`:

```typescript
// Injected into import.meta.env
VITE_SEARCH_TOKEN
VITE_ACCESS_KEYS_ENABLED
VITE_WEBLLM_DEFAULT_F16_MODEL_ID
VITE_WEBLLM_DEFAULT_F32_MODEL_ID
VITE_WLLAMA_DEFAULT_MODEL_ID
VITE_INTERNAL_API_ENABLED
VITE_DEFAULT_INFERENCE_TYPE
```

These are accessed in client code as:
```typescript
const token = import.meta.env.VITE_SEARCH_TOKEN;
```

## Configuration Patterns

### Scenario: Private Team Instance

```bash
# .env
ACCESS_KEYS="team-alpha-2024,team-beta-2024"
ACCESS_KEY_TIMEOUT_HOURS="8"
DEFAULT_INFERENCE_TYPE="internal"
INTERNAL_OPENAI_COMPATIBLE_API_BASE_URL="https://llm.company.com/v1"
INTERNAL_OPENAI_COMPATIBLE_API_KEY="sk-xxx"
INTERNAL_OPENAI_COMPATIBLE_API_MODEL="llama-3.1-70b"
```

### Scenario: Public Demo (No AI)

```bash
# .env - empty, no access keys
# AI disabled by default in settings
```

### Scenario: Browser-Only AI

```bash
# .env - minimal or empty
# Users choose WebLLM or Wllama in settings
# Models download to user's browser (no server AI)
```

## Debugging Configuration

Enable verbose logging:
```bash
# In browser console
localStorage.setItem('debug', 'minisearch:*');
```

Check effective configuration:
```typescript
// In browser console
console.log('Settings:', JSON.parse(localStorage.getItem('settings') || '{}'));
console.log('Env:', import.meta.env);
```

## Related Topics

- **AI Integration**: `docs/ai-integration.md` - Detailed inference type configuration
- **Security**: `docs/security.md` - Access control and privacy details
- **Deployment**: `docs/overview.md` - Container architecture and production setup