Model Configuration Guide
This guide focuses on the technical configuration, settings management, parameter handling, and troubleshooting for LLM providers in the Recipe Chatbot project.
π Looking for model recommendations? See Model Selection Guide for detailed model comparisons and use case recommendations.
π§ Configuration System Overview
Settings Architecture
The project uses a centralized configuration system in config/settings.py with environment variable overrides:
# Configuration loading flow
Environment Variables (.env) β settings.py β LLM Service β Provider APIs
Temperature Management
Each provider has different temperature constraints that are automatically handled:
| Provider | Range | Auto-Handling | Special Cases |
|---|---|---|---|
| OpenAI | 0.0 - 2.0 | β GPT-5-nano β 1.0 | Nano models fixed |
| 0.0 - 1.0 | β Clamp to range | Strict validation | |
| Ollama | 0.0 - 2.0 | β οΈ Model dependent | Local processing |
| HuggingFace | Fixed ~0.7 | β API ignores setting | Read-only |
π οΈ Provider Configuration Details
OpenAI Configuration
Environment Variables
# Core settings
OPENAI_API_KEY=sk-proj-xxxxx
OPENAI_MODEL=gpt-4o-mini
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000
# Advanced parameters (optional)
OPENAI_TOP_P=1.0
OPENAI_FREQUENCY_PENALTY=0.0
OPENAI_PRESENCE_PENALTY=0.0
Automatic Temperature Override
# Implemented in services/llm_service.py
if "gpt-5-nano" in model_name.lower():
temperature = 1.0 # Only supported value
logger.info(f"Auto-adjusting temperature to 1.0 for {model_name}")
Parameter Validation
- Temperature:
0.0 - 2.0(except nano models: fixed1.0) - Max Tokens:
1 - 4096(model-dependent) - Top P:
0.0 - 1.0
Google (Gemini) Configuration
Environment Variables
# Core settings
GOOGLE_API_KEY=AIzaSyxxxxx
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.7
GOOGLE_MAX_TOKENS=1000
# Advanced parameters (optional)
GOOGLE_TOP_P=0.95
GOOGLE_TOP_K=40
Temperature Clamping
# Auto-clamping to Google's range
google_temp = max(0.0, min(1.0, configured_temperature))
if google_temp != configured_temperature:
logger.info(f"Clamping temperature from {configured_temperature} to {google_temp}")
Parameter Constraints
- Temperature:
0.0 - 1.0(strictly enforced) - Max Tokens:
1 - 8192 - Top K:
1 - 40
Ollama Configuration
Environment Variables
# Core settings
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
OLLAMA_TEMPERATURE=0.7
OLLAMA_MAX_TOKENS=1000
# Connection settings
OLLAMA_TIMEOUT=30
OLLAMA_KEEP_ALIVE=5m
Service Management
# Start Ollama service
ollama serve &
# Verify service status
curl http://localhost:11434/api/version
# Model management
ollama pull llama3.1:8b
ollama list
ollama rm unused_model
Parameter Flexibility
- Temperature:
0.0 - 2.0(widest range) - Context Length: Model-dependent (2K - 128K)
- Custom Parameters: Model-specific options available
HuggingFace Configuration
Environment Variables
# Core settings
HUGGINGFACE_API_KEY=hf_xxxxx
HUGGINGFACE_MODEL=microsoft/DialoGPT-medium
HUGGINGFACE_TEMPERATURE=0.7 # Often ignored
HUGGINGFACE_MAX_TOKENS=500
# API settings
HUGGINGFACE_WAIT_FOR_MODEL=true
HUGGINGFACE_USE_CACHE=true
API Limitations
# Note: Temperature is often ignored by Inference API
logger.warning(f"HuggingFace model {model_name} may ignore temperature setting")
return 0.7 # API typically uses this default
βοΈ Advanced Configuration
Dynamic Provider Switching
# config/settings.py implementation
def get_llm_config():
provider = os.getenv("LLM_PROVIDER", "openai").lower()
fallback = os.getenv("LLM_FALLBACK_PROVIDER", "google").lower()
return {
"provider": provider,
"fallback_provider": fallback,
**get_provider_config(provider)
}
def get_provider_config(provider):
"""Get provider-specific configuration."""
configs = {
"openai": {
"api_key": os.getenv("OPENAI_API_KEY"),
"model": os.getenv("OPENAI_MODEL", "gpt-4o-mini"),
"temperature": float(os.getenv("OPENAI_TEMPERATURE", "0.7")),
"max_tokens": int(os.getenv("OPENAI_MAX_TOKENS", "1000")),
},
"google": {
"api_key": os.getenv("GOOGLE_API_KEY"),
"model": os.getenv("GOOGLE_MODEL", "gemini-2.5-flash"),
"temperature": float(os.getenv("GOOGLE_TEMPERATURE", "0.7")),
"max_tokens": int(os.getenv("GOOGLE_MAX_TOKENS", "1000")),
},
# ... other providers
}
return configs.get(provider, {})
Fallback Configuration
# Automatic fallback on provider failure
def get_llm_response(message):
try:
return primary_provider.chat_completion(message)
except Exception as e:
logger.warning(f"Primary provider failed: {e}")
return fallback_provider.chat_completion(message)
Environment-Specific Configs
Development (.env.development)
# Fast, free/cheap for testing
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.8 # More creative for testing
LLM_FALLBACK_PROVIDER=ollama
Production (.env.production)
# Reliable, consistent for production
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-4o-mini
OPENAI_TEMPERATURE=0.7 # Consistent responses
LLM_FALLBACK_PROVIDER=google
Local Development (.env.local)
# Self-hosted for offline development
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
OLLAMA_TEMPERATURE=0.7
# No fallback - fully local
π¨ Configuration Troubleshooting
Issue: GPT-5-nano Temperature Error
Error: Temperature must be 1.0 for gpt-5-nano
Status: β
Auto-fixed in services/llm_service.py
Verification:
python -c "
import os
os.environ['OPENAI_MODEL'] = 'gpt-5-nano'
os.environ['OPENAI_TEMPERATURE'] = '0.5'
from services.llm_service import LLMService
LLMService() # Should log temperature override
"
Issue: Google Temperature Out of Range
Error: Temperature must be between 0.0 and 1.0
Solution: Automatic clamping implemented
Test:
python -c "
import os
os.environ['LLM_PROVIDER'] = 'google'
os.environ['GOOGLE_TEMPERATURE'] = '1.5'
from services.llm_service import LLMService
LLMService() # Should clamp to 1.0
"
Issue: Ollama Connection Failed
Error: ConnectionError: Could not connect to Ollama
Diagnosis:
# Check if Ollama is running
curl -f http://localhost:11434/api/version || echo "Ollama not running"
# Check if model exists
ollama list | grep "llama3.1:8b" || echo "Model not found"
# Check system resources
free -h # RAM usage
df -h # Disk space
Fix:
# Start Ollama service
ollama serve &
# Pull required model
ollama pull llama3.1:8b
# Test connection
curl -d '{"model":"llama3.1:8b","prompt":"test","stream":false}' \
http://localhost:11434/api/generate
Issue: HuggingFace Temperature Ignored
Issue: Settings have no effect on response Explanation: This is expected behavior - HuggingFace Inference API typically ignores temperature Workaround: Use different models or providers for temperature control
Issue: Missing API Keys
Error: AuthenticationError: Invalid API key
Diagnosis:
# Check environment variables
echo "OpenAI: ${OPENAI_API_KEY:0:10}..."
echo "Google: ${GOOGLE_API_KEY:0:10}..."
echo "HuggingFace: ${HUGGINGFACE_API_KEY:0:10}..."
# Test API key validity
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/models | jq '.data[0].id' || echo "Invalid OpenAI key"
π Configuration Validation
Automated Configuration Check
# Run comprehensive configuration validation
python -c "
from config.settings import get_llm_config
from services.llm_service import LLMService
import json
print('π§ Configuration Validation')
print('=' * 40)
# Load configuration
try:
config = get_llm_config()
print('β
Configuration loaded successfully')
print(f'Provider: {config.get(\"provider\")}')
print(f'Model: {config.get(\"model\")}')
print(f'Temperature: {config.get(\"temperature\")}')
except Exception as e:
print(f'β Configuration error: {e}')
exit(1)
# Test service initialization
try:
service = LLMService()
print('β
LLM Service initialized')
except Exception as e:
print(f'β Service initialization failed: {e}')
exit(1)
# Test simple completion
try:
response = service.simple_chat_completion('Test message')
print('β
Chat completion successful')
print(f'Response length: {len(response)} characters')
except Exception as e:
print(f'β Chat completion failed: {e}')
exit(1)
print('π All configuration checks passed!')
"
Provider-Specific Health Checks
# OpenAI health check
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/models | jq '.data | length'
# Google health check
curl "https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEY" | jq '.models | length'
# Ollama health check
curl http://localhost:11434/api/tags | jq '.models | length'
# HuggingFace health check
curl -H "Authorization: Bearer $HUGGINGFACE_API_KEY" \
https://huggingface.co/api/whoami | jq '.name'
Configuration Diff Tool
# Compare current config with defaults
python -c "
import os
from config.settings import get_llm_config
defaults = {
'openai': {'temperature': 0.7, 'max_tokens': 1000},
'google': {'temperature': 0.7, 'max_tokens': 1000},
'ollama': {'temperature': 0.7, 'max_tokens': 1000},
}
current = get_llm_config()
provider = current.get('provider')
default = defaults.get(provider, {})
print(f'Configuration for {provider}:')
for key, default_val in default.items():
current_val = current.get(key)
status = 'β
' if current_val == default_val else 'β οΈ'
print(f'{status} {key}: {current_val} (default: {default_val})')
"
π Configuration Templates
Minimal Setup (Single Provider)
# .env.minimal
LLM_PROVIDER=google
GOOGLE_API_KEY=your_api_key
GOOGLE_MODEL=gemini-2.5-flash
Robust Setup (Primary + Fallback)
# .env.robust
LLM_PROVIDER=openai
OPENAI_API_KEY=your_primary_key
OPENAI_MODEL=gpt-4o-mini
LLM_FALLBACK_PROVIDER=google
GOOGLE_API_KEY=your_fallback_key
GOOGLE_MODEL=gemini-2.5-flash
Local-First Setup
# .env.local-first
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
LLM_FALLBACK_PROVIDER=google
GOOGLE_API_KEY=your_cloud_backup_key
Budget-Conscious Setup
# .env.budget
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano
OPENAI_TEMPERATURE=1.0 # Fixed for nano
OPENAI_MAX_TOKENS=500 # Reduce costs
π Security Best Practices
API Key Management
# Use environment variables
export OPENAI_API_KEY="sk-..."
# Never commit keys to git
echo "*.env*" >> .gitignore
echo ".env" >> .gitignore
# Use different keys for different environments
cp .env.example .env.development
cp .env.example .env.production
Rate Limiting Configuration
# Add to config/settings.py
RATE_LIMITS = {
"openai": {"rpm": 500, "tpm": 40000},
"google": {"rpm": 60, "tpm": 32000},
"ollama": {"rpm": None, "tpm": None}, # Local = unlimited
}
Error Handling Strategy
# Graceful degradation configuration
FALLBACK_CHAIN = [
"primary_provider",
"fallback_provider",
"local_provider",
"cached_response"
]
π§ͺ Testing Configuration Changes
Unit Tests for Configuration
# Test temperature overrides
python -m pytest tests/test_llm_temperature.py -v
# Test provider fallbacks
python -m pytest tests/test_llm_fallback.py -v
# Test API key validation
python -m pytest tests/test_api_keys.py -v
Integration Tests
# Test each provider individually
python -c "
import os
providers = ['openai', 'google', 'ollama']
for provider in providers:
os.environ['LLM_PROVIDER'] = provider
try:
from services.llm_service import LLMService
service = LLMService()
response = service.simple_chat_completion('Test')
print(f'β
{provider}: {len(response)} chars')
except Exception as e:
print(f'β {provider}: {e}')
"
Performance Benchmarks
# Measure response times
python -c "
import time
from services.llm_service import LLMService
service = LLMService()
start = time.time()
response = service.simple_chat_completion('Quick recipe suggestion')
elapsed = time.time() - start
print(f'Response time: {elapsed:.2f}s')
print(f'Response length: {len(response)} characters')
print(f'Words per second: {len(response.split()) / elapsed:.1f}')
"
π Configuration Migration
Upgrading from Old Configuration
# Migrate old environment variables
# Old format β New format
mv .env .env.backup
# Update variable names
sed 's/LLM_MODEL=/OPENAI_MODEL=/' .env.backup > .env
sed -i 's/LLM_TEMPERATURE=/OPENAI_TEMPERATURE=/' .env
sed -i 's/LLM_MAX_TOKENS=/OPENAI_MAX_TOKENS=/' .env
echo "LLM_PROVIDER=openai" >> .env
Version Compatibility Check
# Check if configuration is compatible
def check_config_version():
required_vars = ["LLM_PROVIDER"]
legacy_vars = ["LLM_MODEL", "LLM_TEMPERATURE"]
has_new = all(os.getenv(var) for var in required_vars)
has_legacy = any(os.getenv(var) for var in legacy_vars)
if has_legacy and not has_new:
raise ValueError("Legacy configuration detected. Please migrate to new format.")
return has_new
π‘ Next Steps: After configuring your providers, see the Model Selection Guide for choosing the best models for your use case.