plg4-dev-server / backend /docs /model-configuration-guide.md
Jesse Johnson
New commit for backend deployment: 2025-09-25_13-24-03
c59d808
# Model Configuration Guide
This guide focuses on the technical configuration, settings management, parameter handling, and troubleshooting for LLM providers in the Recipe Chatbot project.
> πŸ“š **Looking for model recommendations?** See [Model Selection Guide](./model-selection-guide.md) for detailed model comparisons and use case recommendations.
## πŸ”§ Configuration System Overview
### Settings Architecture
The project uses a centralized configuration system in `config/settings.py` with environment variable overrides:
```python
# Configuration loading flow
Environment Variables (.env) β†’ settings.py β†’ LLM Service β†’ Provider APIs
```
### Temperature Management
Each provider has different temperature constraints that are automatically handled:
| Provider | Range | Auto-Handling | Special Cases |
|----------|-------|---------------|---------------|
| **OpenAI** | 0.0 - 2.0 | βœ… GPT-5-nano β†’ 1.0 | Nano models fixed |
| **Google** | 0.0 - 1.0 | βœ… Clamp to range | Strict validation |
| **Ollama** | 0.0 - 2.0 | ⚠️ Model dependent | Local processing |
| **HuggingFace** | Fixed ~0.7 | ❌ API ignores setting | Read-only |
## πŸ› οΈ Provider Configuration Details
### OpenAI Configuration
#### Environment Variables
```bash
# Core settings
OPENAI_API_KEY=sk-proj-xxxxx
OPENAI_MODEL=gpt-4o-mini
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000
# Advanced parameters (optional)
OPENAI_TOP_P=1.0
OPENAI_FREQUENCY_PENALTY=0.0
OPENAI_PRESENCE_PENALTY=0.0
```
#### Automatic Temperature Override
```python
# Implemented in services/llm_service.py
if "gpt-5-nano" in model_name.lower():
temperature = 1.0 # Only supported value
logger.info(f"Auto-adjusting temperature to 1.0 for {model_name}")
```
#### Parameter Validation
- **Temperature**: `0.0 - 2.0` (except nano models: fixed `1.0`)
- **Max Tokens**: `1 - 4096` (model-dependent)
- **Top P**: `0.0 - 1.0`
### Google (Gemini) Configuration
#### Environment Variables
```bash
# Core settings
GOOGLE_API_KEY=AIzaSyxxxxx
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.7
GOOGLE_MAX_TOKENS=1000
# Advanced parameters (optional)
GOOGLE_TOP_P=0.95
GOOGLE_TOP_K=40
```
#### Temperature Clamping
```python
# Auto-clamping to Google's range
google_temp = max(0.0, min(1.0, configured_temperature))
if google_temp != configured_temperature:
logger.info(f"Clamping temperature from {configured_temperature} to {google_temp}")
```
#### Parameter Constraints
- **Temperature**: `0.0 - 1.0` (strictly enforced)
- **Max Tokens**: `1 - 8192`
- **Top K**: `1 - 40`
### Ollama Configuration
#### Environment Variables
```bash
# Core settings
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
OLLAMA_TEMPERATURE=0.7
OLLAMA_MAX_TOKENS=1000
# Connection settings
OLLAMA_TIMEOUT=30
OLLAMA_KEEP_ALIVE=5m
```
#### Service Management
```bash
# Start Ollama service
ollama serve &
# Verify service status
curl http://localhost:11434/api/version
# Model management
ollama pull llama3.1:8b
ollama list
ollama rm unused_model
```
#### Parameter Flexibility
- **Temperature**: `0.0 - 2.0` (widest range)
- **Context Length**: Model-dependent (2K - 128K)
- **Custom Parameters**: Model-specific options available
### HuggingFace Configuration
#### Environment Variables
```bash
# Core settings
HUGGINGFACE_API_KEY=hf_xxxxx
HUGGINGFACE_MODEL=microsoft/DialoGPT-medium
HUGGINGFACE_TEMPERATURE=0.7 # Often ignored
HUGGINGFACE_MAX_TOKENS=500
# API settings
HUGGINGFACE_WAIT_FOR_MODEL=true
HUGGINGFACE_USE_CACHE=true
```
#### API Limitations
```python
# Note: Temperature is often ignored by Inference API
logger.warning(f"HuggingFace model {model_name} may ignore temperature setting")
return 0.7 # API typically uses this default
```
## βš™οΈ Advanced Configuration
### Dynamic Provider Switching
```python
# config/settings.py implementation
def get_llm_config():
provider = os.getenv("LLM_PROVIDER", "openai").lower()
fallback = os.getenv("LLM_FALLBACK_PROVIDER", "google").lower()
return {
"provider": provider,
"fallback_provider": fallback,
**get_provider_config(provider)
}
def get_provider_config(provider):
"""Get provider-specific configuration."""
configs = {
"openai": {
"api_key": os.getenv("OPENAI_API_KEY"),
"model": os.getenv("OPENAI_MODEL", "gpt-4o-mini"),
"temperature": float(os.getenv("OPENAI_TEMPERATURE", "0.7")),
"max_tokens": int(os.getenv("OPENAI_MAX_TOKENS", "1000")),
},
"google": {
"api_key": os.getenv("GOOGLE_API_KEY"),
"model": os.getenv("GOOGLE_MODEL", "gemini-2.5-flash"),
"temperature": float(os.getenv("GOOGLE_TEMPERATURE", "0.7")),
"max_tokens": int(os.getenv("GOOGLE_MAX_TOKENS", "1000")),
},
# ... other providers
}
return configs.get(provider, {})
```
### Fallback Configuration
```python
# Automatic fallback on provider failure
def get_llm_response(message):
try:
return primary_provider.chat_completion(message)
except Exception as e:
logger.warning(f"Primary provider failed: {e}")
return fallback_provider.chat_completion(message)
```
### Environment-Specific Configs
#### Development (.env.development)
```bash
# Fast, free/cheap for testing
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.8 # More creative for testing
LLM_FALLBACK_PROVIDER=ollama
```
#### Production (.env.production)
```bash
# Reliable, consistent for production
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-4o-mini
OPENAI_TEMPERATURE=0.7 # Consistent responses
LLM_FALLBACK_PROVIDER=google
```
#### Local Development (.env.local)
```bash
# Self-hosted for offline development
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
OLLAMA_TEMPERATURE=0.7
# No fallback - fully local
```
## 🚨 Configuration Troubleshooting
### Issue: GPT-5-nano Temperature Error
**Error**: `Temperature must be 1.0 for gpt-5-nano`
**Status**: βœ… Auto-fixed in `services/llm_service.py`
**Verification**:
```bash
python -c "
import os
os.environ['OPENAI_MODEL'] = 'gpt-5-nano'
os.environ['OPENAI_TEMPERATURE'] = '0.5'
from services.llm_service import LLMService
LLMService() # Should log temperature override
"
```
### Issue: Google Temperature Out of Range
**Error**: `Temperature must be between 0.0 and 1.0`
**Solution**: Automatic clamping implemented
**Test**:
```bash
python -c "
import os
os.environ['LLM_PROVIDER'] = 'google'
os.environ['GOOGLE_TEMPERATURE'] = '1.5'
from services.llm_service import LLMService
LLMService() # Should clamp to 1.0
"
```
### Issue: Ollama Connection Failed
**Error**: `ConnectionError: Could not connect to Ollama`
**Diagnosis**:
```bash
# Check if Ollama is running
curl -f http://localhost:11434/api/version || echo "Ollama not running"
# Check if model exists
ollama list | grep "llama3.1:8b" || echo "Model not found"
# Check system resources
free -h # RAM usage
df -h # Disk space
```
**Fix**:
```bash
# Start Ollama service
ollama serve &
# Pull required model
ollama pull llama3.1:8b
# Test connection
curl -d '{"model":"llama3.1:8b","prompt":"test","stream":false}' \
http://localhost:11434/api/generate
```
### Issue: HuggingFace Temperature Ignored
**Issue**: Settings have no effect on response
**Explanation**: This is expected behavior - HuggingFace Inference API typically ignores temperature
**Workaround**: Use different models or providers for temperature control
### Issue: Missing API Keys
**Error**: `AuthenticationError: Invalid API key`
**Diagnosis**:
```bash
# Check environment variables
echo "OpenAI: ${OPENAI_API_KEY:0:10}..."
echo "Google: ${GOOGLE_API_KEY:0:10}..."
echo "HuggingFace: ${HUGGINGFACE_API_KEY:0:10}..."
# Test API key validity
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/models | jq '.data[0].id' || echo "Invalid OpenAI key"
```
## πŸ” Configuration Validation
### Automated Configuration Check
```bash
# Run comprehensive configuration validation
python -c "
from config.settings import get_llm_config
from services.llm_service import LLMService
import json
print('πŸ”§ Configuration Validation')
print('=' * 40)
# Load configuration
try:
config = get_llm_config()
print('βœ… Configuration loaded successfully')
print(f'Provider: {config.get(\"provider\")}')
print(f'Model: {config.get(\"model\")}')
print(f'Temperature: {config.get(\"temperature\")}')
except Exception as e:
print(f'❌ Configuration error: {e}')
exit(1)
# Test service initialization
try:
service = LLMService()
print('βœ… LLM Service initialized')
except Exception as e:
print(f'❌ Service initialization failed: {e}')
exit(1)
# Test simple completion
try:
response = service.simple_chat_completion('Test message')
print('βœ… Chat completion successful')
print(f'Response length: {len(response)} characters')
except Exception as e:
print(f'❌ Chat completion failed: {e}')
exit(1)
print('πŸŽ‰ All configuration checks passed!')
"
```
### Provider-Specific Health Checks
```bash
# OpenAI health check
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/models | jq '.data | length'
# Google health check
curl "https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEY" | jq '.models | length'
# Ollama health check
curl http://localhost:11434/api/tags | jq '.models | length'
# HuggingFace health check
curl -H "Authorization: Bearer $HUGGINGFACE_API_KEY" \
https://huggingface.co/api/whoami | jq '.name'
```
### Configuration Diff Tool
```bash
# Compare current config with defaults
python -c "
import os
from config.settings import get_llm_config
defaults = {
'openai': {'temperature': 0.7, 'max_tokens': 1000},
'google': {'temperature': 0.7, 'max_tokens': 1000},
'ollama': {'temperature': 0.7, 'max_tokens': 1000},
}
current = get_llm_config()
provider = current.get('provider')
default = defaults.get(provider, {})
print(f'Configuration for {provider}:')
for key, default_val in default.items():
current_val = current.get(key)
status = 'βœ…' if current_val == default_val else '⚠️'
print(f'{status} {key}: {current_val} (default: {default_val})')
"
```
## πŸ“‹ Configuration Templates
### Minimal Setup (Single Provider)
```bash
# .env.minimal
LLM_PROVIDER=google
GOOGLE_API_KEY=your_api_key
GOOGLE_MODEL=gemini-2.5-flash
```
### Robust Setup (Primary + Fallback)
```bash
# .env.robust
LLM_PROVIDER=openai
OPENAI_API_KEY=your_primary_key
OPENAI_MODEL=gpt-4o-mini
LLM_FALLBACK_PROVIDER=google
GOOGLE_API_KEY=your_fallback_key
GOOGLE_MODEL=gemini-2.5-flash
```
### Local-First Setup
```bash
# .env.local-first
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
LLM_FALLBACK_PROVIDER=google
GOOGLE_API_KEY=your_cloud_backup_key
```
### Budget-Conscious Setup
```bash
# .env.budget
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano
OPENAI_TEMPERATURE=1.0 # Fixed for nano
OPENAI_MAX_TOKENS=500 # Reduce costs
```
## πŸ” Security Best Practices
### API Key Management
```bash
# Use environment variables
export OPENAI_API_KEY="sk-..."
# Never commit keys to git
echo "*.env*" >> .gitignore
echo ".env" >> .gitignore
# Use different keys for different environments
cp .env.example .env.development
cp .env.example .env.production
```
### Rate Limiting Configuration
```python
# Add to config/settings.py
RATE_LIMITS = {
"openai": {"rpm": 500, "tpm": 40000},
"google": {"rpm": 60, "tpm": 32000},
"ollama": {"rpm": None, "tpm": None}, # Local = unlimited
}
```
### Error Handling Strategy
```python
# Graceful degradation configuration
FALLBACK_CHAIN = [
"primary_provider",
"fallback_provider",
"local_provider",
"cached_response"
]
```
## πŸ§ͺ Testing Configuration Changes
### Unit Tests for Configuration
```bash
# Test temperature overrides
python -m pytest tests/test_llm_temperature.py -v
# Test provider fallbacks
python -m pytest tests/test_llm_fallback.py -v
# Test API key validation
python -m pytest tests/test_api_keys.py -v
```
### Integration Tests
```bash
# Test each provider individually
python -c "
import os
providers = ['openai', 'google', 'ollama']
for provider in providers:
os.environ['LLM_PROVIDER'] = provider
try:
from services.llm_service import LLMService
service = LLMService()
response = service.simple_chat_completion('Test')
print(f'βœ… {provider}: {len(response)} chars')
except Exception as e:
print(f'❌ {provider}: {e}')
"
```
### Performance Benchmarks
```bash
# Measure response times
python -c "
import time
from services.llm_service import LLMService
service = LLMService()
start = time.time()
response = service.simple_chat_completion('Quick recipe suggestion')
elapsed = time.time() - start
print(f'Response time: {elapsed:.2f}s')
print(f'Response length: {len(response)} characters')
print(f'Words per second: {len(response.split()) / elapsed:.1f}')
"
```
## πŸ”„ Configuration Migration
### Upgrading from Old Configuration
```bash
# Migrate old environment variables
# Old format β†’ New format
mv .env .env.backup
# Update variable names
sed 's/LLM_MODEL=/OPENAI_MODEL=/' .env.backup > .env
sed -i 's/LLM_TEMPERATURE=/OPENAI_TEMPERATURE=/' .env
sed -i 's/LLM_MAX_TOKENS=/OPENAI_MAX_TOKENS=/' .env
echo "LLM_PROVIDER=openai" >> .env
```
### Version Compatibility Check
```python
# Check if configuration is compatible
def check_config_version():
required_vars = ["LLM_PROVIDER"]
legacy_vars = ["LLM_MODEL", "LLM_TEMPERATURE"]
has_new = all(os.getenv(var) for var in required_vars)
has_legacy = any(os.getenv(var) for var in legacy_vars)
if has_legacy and not has_new:
raise ValueError("Legacy configuration detected. Please migrate to new format.")
return has_new
```
---
πŸ’‘ **Next Steps**: After configuring your providers, see the [Model Selection Guide](./model-selection-guide.md) for choosing the best models for your use case.