Spaces:

jessejohnson
/

plg4-dev-server

Paused

File size: 14,051 Bytes

c59d808

# Model Configuration Guide

This guide focuses on the technical configuration, settings management, parameter handling, and troubleshooting for LLM providers in the Recipe Chatbot project.

> 📚 **Looking for model recommendations?** See [Model Selection Guide](./model-selection-guide.md) for detailed model comparisons and use case recommendations.

## 🔧 Configuration System Overview

### Settings Architecture
The project uses a centralized configuration system in `config/settings.py` with environment variable overrides:

```python
# Configuration loading flow
Environment Variables (.env) → settings.py → LLM Service → Provider APIs
```

### Temperature Management
Each provider has different temperature constraints that are automatically handled:

| Provider | Range | Auto-Handling | Special Cases |
|----------|-------|---------------|---------------|
| **OpenAI** | 0.0 - 2.0 | ✅ GPT-5-nano → 1.0 | Nano models fixed |
| **Google** | 0.0 - 1.0 | ✅ Clamp to range | Strict validation |
| **Ollama** | 0.0 - 2.0 | ⚠️ Model dependent | Local processing |
| **HuggingFace** | Fixed ~0.7 | ❌ API ignores setting | Read-only |

## 🛠️ Provider Configuration Details

### OpenAI Configuration

#### Environment Variables
```bash
# Core settings
OPENAI_API_KEY=sk-proj-xxxxx
OPENAI_MODEL=gpt-4o-mini
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000

# Advanced parameters (optional)
OPENAI_TOP_P=1.0
OPENAI_FREQUENCY_PENALTY=0.0
OPENAI_PRESENCE_PENALTY=0.0
```

#### Automatic Temperature Override
```python
# Implemented in services/llm_service.py
if "gpt-5-nano" in model_name.lower():
    temperature = 1.0  # Only supported value
    logger.info(f"Auto-adjusting temperature to 1.0 for {model_name}")
```

#### Parameter Validation
- **Temperature**: `0.0 - 2.0` (except nano models: fixed `1.0`)
- **Max Tokens**: `1 - 4096` (model-dependent)
- **Top P**: `0.0 - 1.0`

### Google (Gemini) Configuration

#### Environment Variables
```bash
# Core settings
GOOGLE_API_KEY=AIzaSyxxxxx
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.7
GOOGLE_MAX_TOKENS=1000

# Advanced parameters (optional)
GOOGLE_TOP_P=0.95
GOOGLE_TOP_K=40
```

#### Temperature Clamping
```python
# Auto-clamping to Google's range
google_temp = max(0.0, min(1.0, configured_temperature))
if google_temp != configured_temperature:
    logger.info(f"Clamping temperature from {configured_temperature} to {google_temp}")
```

#### Parameter Constraints
- **Temperature**: `0.0 - 1.0` (strictly enforced)
- **Max Tokens**: `1 - 8192`
- **Top K**: `1 - 40`

### Ollama Configuration

#### Environment Variables
```bash
# Core settings
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
OLLAMA_TEMPERATURE=0.7
OLLAMA_MAX_TOKENS=1000

# Connection settings
OLLAMA_TIMEOUT=30
OLLAMA_KEEP_ALIVE=5m
```

#### Service Management
```bash
# Start Ollama service
ollama serve &

# Verify service status
curl http://localhost:11434/api/version

# Model management
ollama pull llama3.1:8b
ollama list
ollama rm unused_model
```

#### Parameter Flexibility
- **Temperature**: `0.0 - 2.0` (widest range)
- **Context Length**: Model-dependent (2K - 128K)
- **Custom Parameters**: Model-specific options available

### HuggingFace Configuration

#### Environment Variables
```bash
# Core settings
HUGGINGFACE_API_KEY=hf_xxxxx
HUGGINGFACE_MODEL=microsoft/DialoGPT-medium
HUGGINGFACE_TEMPERATURE=0.7  # Often ignored
HUGGINGFACE_MAX_TOKENS=500

# API settings
HUGGINGFACE_WAIT_FOR_MODEL=true
HUGGINGFACE_USE_CACHE=true
```

#### API Limitations
```python
# Note: Temperature is often ignored by Inference API
logger.warning(f"HuggingFace model {model_name} may ignore temperature setting")
return 0.7  # API typically uses this default
```

## ⚙️ Advanced Configuration

### Dynamic Provider Switching
```python
# config/settings.py implementation
def get_llm_config():
    provider = os.getenv("LLM_PROVIDER", "openai").lower()
    fallback = os.getenv("LLM_FALLBACK_PROVIDER", "google").lower()
    
    return {
        "provider": provider,
        "fallback_provider": fallback,
        **get_provider_config(provider)
    }

def get_provider_config(provider):
    """Get provider-specific configuration."""
    configs = {
        "openai": {
            "api_key": os.getenv("OPENAI_API_KEY"),
            "model": os.getenv("OPENAI_MODEL", "gpt-4o-mini"),
            "temperature": float(os.getenv("OPENAI_TEMPERATURE", "0.7")),
            "max_tokens": int(os.getenv("OPENAI_MAX_TOKENS", "1000")),
        },
        "google": {
            "api_key": os.getenv("GOOGLE_API_KEY"),
            "model": os.getenv("GOOGLE_MODEL", "gemini-2.5-flash"),
            "temperature": float(os.getenv("GOOGLE_TEMPERATURE", "0.7")),
            "max_tokens": int(os.getenv("GOOGLE_MAX_TOKENS", "1000")),
        },
        # ... other providers
    }
    return configs.get(provider, {})
```

### Fallback Configuration
```python
# Automatic fallback on provider failure
def get_llm_response(message):
    try:
        return primary_provider.chat_completion(message)
    except Exception as e:
        logger.warning(f"Primary provider failed: {e}")
        return fallback_provider.chat_completion(message)
```

### Environment-Specific Configs

#### Development (.env.development)
```bash
# Fast, free/cheap for testing
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.8  # More creative for testing
LLM_FALLBACK_PROVIDER=ollama
```

#### Production (.env.production)
```bash
# Reliable, consistent for production
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-4o-mini
OPENAI_TEMPERATURE=0.7  # Consistent responses
LLM_FALLBACK_PROVIDER=google
```

#### Local Development (.env.local)
```bash
# Self-hosted for offline development
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
OLLAMA_TEMPERATURE=0.7
# No fallback - fully local
```

## 🚨 Configuration Troubleshooting

### Issue: GPT-5-nano Temperature Error
**Error**: `Temperature must be 1.0 for gpt-5-nano`
**Status**: ✅ Auto-fixed in `services/llm_service.py`
**Verification**:
```bash
python -c "
import os
os.environ['OPENAI_MODEL'] = 'gpt-5-nano'
os.environ['OPENAI_TEMPERATURE'] = '0.5'
from services.llm_service import LLMService
LLMService()  # Should log temperature override
"
```

### Issue: Google Temperature Out of Range
**Error**: `Temperature must be between 0.0 and 1.0`
**Solution**: Automatic clamping implemented
**Test**:
```bash
python -c "
import os
os.environ['LLM_PROVIDER'] = 'google'
os.environ['GOOGLE_TEMPERATURE'] = '1.5'
from services.llm_service import LLMService
LLMService()  # Should clamp to 1.0
"
```

### Issue: Ollama Connection Failed
**Error**: `ConnectionError: Could not connect to Ollama`
**Diagnosis**:
```bash
# Check if Ollama is running
curl -f http://localhost:11434/api/version || echo "Ollama not running"

# Check if model exists
ollama list | grep "llama3.1:8b" || echo "Model not found"

# Check system resources
free -h  # RAM usage
df -h    # Disk space
```

**Fix**:
```bash
# Start Ollama service
ollama serve &

# Pull required model
ollama pull llama3.1:8b

# Test connection
curl -d '{"model":"llama3.1:8b","prompt":"test","stream":false}' \
     http://localhost:11434/api/generate
```

### Issue: HuggingFace Temperature Ignored
**Issue**: Settings have no effect on response
**Explanation**: This is expected behavior - HuggingFace Inference API typically ignores temperature
**Workaround**: Use different models or providers for temperature control

### Issue: Missing API Keys
**Error**: `AuthenticationError: Invalid API key`
**Diagnosis**:
```bash
# Check environment variables
echo "OpenAI: ${OPENAI_API_KEY:0:10}..." 
echo "Google: ${GOOGLE_API_KEY:0:10}..."
echo "HuggingFace: ${HUGGINGFACE_API_KEY:0:10}..."

# Test API key validity
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
     https://api.openai.com/v1/models | jq '.data[0].id' || echo "Invalid OpenAI key"
```

## 🔍 Configuration Validation

### Automated Configuration Check
```bash
# Run comprehensive configuration validation
python -c "
from config.settings import get_llm_config
from services.llm_service import LLMService
import json

print('🔧 Configuration Validation')
print('=' * 40)

# Load configuration
try:
    config = get_llm_config()
    print('✅ Configuration loaded successfully')
    print(f'Provider: {config.get(\"provider\")}')
    print(f'Model: {config.get(\"model\")}')
    print(f'Temperature: {config.get(\"temperature\")}')
except Exception as e:
    print(f'❌ Configuration error: {e}')
    exit(1)

# Test service initialization
try:
    service = LLMService()
    print('✅ LLM Service initialized')
except Exception as e:
    print(f'❌ Service initialization failed: {e}')
    exit(1)

# Test simple completion
try:
    response = service.simple_chat_completion('Test message')
    print('✅ Chat completion successful')
    print(f'Response length: {len(response)} characters')
except Exception as e:
    print(f'❌ Chat completion failed: {e}')
    exit(1)

print('🎉 All configuration checks passed!')
"
```

### Provider-Specific Health Checks
```bash
# OpenAI health check
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
     https://api.openai.com/v1/models | jq '.data | length'

# Google health check  
curl "https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEY" | jq '.models | length'

# Ollama health check
curl http://localhost:11434/api/tags | jq '.models | length'

# HuggingFace health check
curl -H "Authorization: Bearer $HUGGINGFACE_API_KEY" \
     https://huggingface.co/api/whoami | jq '.name'
```

### Configuration Diff Tool
```bash
# Compare current config with defaults
python -c "
import os
from config.settings import get_llm_config

defaults = {
    'openai': {'temperature': 0.7, 'max_tokens': 1000},
    'google': {'temperature': 0.7, 'max_tokens': 1000},
    'ollama': {'temperature': 0.7, 'max_tokens': 1000},
}

current = get_llm_config()
provider = current.get('provider')
default = defaults.get(provider, {})

print(f'Configuration for {provider}:')
for key, default_val in default.items():
    current_val = current.get(key)
    status = '✅' if current_val == default_val else '⚠️'
    print(f'{status} {key}: {current_val} (default: {default_val})')
"
```

## 📋 Configuration Templates

### Minimal Setup (Single Provider)
```bash
# .env.minimal
LLM_PROVIDER=google
GOOGLE_API_KEY=your_api_key
GOOGLE_MODEL=gemini-2.5-flash
```

### Robust Setup (Primary + Fallback)
```bash
# .env.robust  
LLM_PROVIDER=openai
OPENAI_API_KEY=your_primary_key
OPENAI_MODEL=gpt-4o-mini
LLM_FALLBACK_PROVIDER=google
GOOGLE_API_KEY=your_fallback_key
GOOGLE_MODEL=gemini-2.5-flash
```

### Local-First Setup
```bash
# .env.local-first
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
LLM_FALLBACK_PROVIDER=google
GOOGLE_API_KEY=your_cloud_backup_key
```

### Budget-Conscious Setup
```bash
# .env.budget
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano
OPENAI_TEMPERATURE=1.0  # Fixed for nano
OPENAI_MAX_TOKENS=500   # Reduce costs
```

## 🔐 Security Best Practices

### API Key Management
```bash
# Use environment variables
export OPENAI_API_KEY="sk-..."

# Never commit keys to git
echo "*.env*" >> .gitignore
echo ".env" >> .gitignore

# Use different keys for different environments
cp .env.example .env.development
cp .env.example .env.production
```

### Rate Limiting Configuration
```python
# Add to config/settings.py
RATE_LIMITS = {
    "openai": {"rpm": 500, "tpm": 40000},
    "google": {"rpm": 60, "tpm": 32000},
    "ollama": {"rpm": None, "tpm": None},  # Local = unlimited
}
```

### Error Handling Strategy
```python
# Graceful degradation configuration
FALLBACK_CHAIN = [
    "primary_provider",
    "fallback_provider", 
    "local_provider",
    "cached_response"
]
```

## 🧪 Testing Configuration Changes

### Unit Tests for Configuration
```bash
# Test temperature overrides
python -m pytest tests/test_llm_temperature.py -v

# Test provider fallbacks
python -m pytest tests/test_llm_fallback.py -v

# Test API key validation
python -m pytest tests/test_api_keys.py -v
```

### Integration Tests
```bash
# Test each provider individually
python -c "
import os
providers = ['openai', 'google', 'ollama']

for provider in providers:
    os.environ['LLM_PROVIDER'] = provider
    try:
        from services.llm_service import LLMService
        service = LLMService()
        response = service.simple_chat_completion('Test')
        print(f'✅ {provider}: {len(response)} chars')
    except Exception as e:
        print(f'❌ {provider}: {e}')
"
```

### Performance Benchmarks
```bash
# Measure response times
python -c "
import time
from services.llm_service import LLMService

service = LLMService()
start = time.time()
response = service.simple_chat_completion('Quick recipe suggestion')
elapsed = time.time() - start

print(f'Response time: {elapsed:.2f}s')
print(f'Response length: {len(response)} characters')
print(f'Words per second: {len(response.split()) / elapsed:.1f}')
"
```

## 🔄 Configuration Migration

### Upgrading from Old Configuration
```bash
# Migrate old environment variables
# Old format → New format
mv .env .env.backup

# Update variable names
sed 's/LLM_MODEL=/OPENAI_MODEL=/' .env.backup > .env
sed -i 's/LLM_TEMPERATURE=/OPENAI_TEMPERATURE=/' .env
sed -i 's/LLM_MAX_TOKENS=/OPENAI_MAX_TOKENS=/' .env

echo "LLM_PROVIDER=openai" >> .env
```

### Version Compatibility Check
```python
# Check if configuration is compatible
def check_config_version():
    required_vars = ["LLM_PROVIDER"]
    legacy_vars = ["LLM_MODEL", "LLM_TEMPERATURE"]
    
    has_new = all(os.getenv(var) for var in required_vars)
    has_legacy = any(os.getenv(var) for var in legacy_vars)
    
    if has_legacy and not has_new:
        raise ValueError("Legacy configuration detected. Please migrate to new format.")
    
    return has_new
```

---

💡 **Next Steps**: After configuring your providers, see the [Model Selection Guide](./model-selection-guide.md) for choosing the best models for your use case.