# Model Configuration Guide This guide focuses on the technical configuration, settings management, parameter handling, and troubleshooting for LLM providers in the Recipe Chatbot project. > ๐Ÿ“š **Looking for model recommendations?** See [Model Selection Guide](./model-selection-guide.md) for detailed model comparisons and use case recommendations. ## ๐Ÿ”ง Configuration System Overview ### Settings Architecture The project uses a centralized configuration system in `config/settings.py` with environment variable overrides: ```python # Configuration loading flow Environment Variables (.env) โ†’ settings.py โ†’ LLM Service โ†’ Provider APIs ``` ### Temperature Management Each provider has different temperature constraints that are automatically handled: | Provider | Range | Auto-Handling | Special Cases | |----------|-------|---------------|---------------| | **OpenAI** | 0.0 - 2.0 | โœ… GPT-5-nano โ†’ 1.0 | Nano models fixed | | **Google** | 0.0 - 1.0 | โœ… Clamp to range | Strict validation | | **Ollama** | 0.0 - 2.0 | โš ๏ธ Model dependent | Local processing | | **HuggingFace** | Fixed ~0.7 | โŒ API ignores setting | Read-only | ## ๐Ÿ› ๏ธ Provider Configuration Details ### OpenAI Configuration #### Environment Variables ```bash # Core settings OPENAI_API_KEY=sk-proj-xxxxx OPENAI_MODEL=gpt-4o-mini OPENAI_TEMPERATURE=0.7 OPENAI_MAX_TOKENS=1000 # Advanced parameters (optional) OPENAI_TOP_P=1.0 OPENAI_FREQUENCY_PENALTY=0.0 OPENAI_PRESENCE_PENALTY=0.0 ``` #### Automatic Temperature Override ```python # Implemented in services/llm_service.py if "gpt-5-nano" in model_name.lower(): temperature = 1.0 # Only supported value logger.info(f"Auto-adjusting temperature to 1.0 for {model_name}") ``` #### Parameter Validation - **Temperature**: `0.0 - 2.0` (except nano models: fixed `1.0`) - **Max Tokens**: `1 - 4096` (model-dependent) - **Top P**: `0.0 - 1.0` ### Google (Gemini) Configuration #### Environment Variables ```bash # Core settings GOOGLE_API_KEY=AIzaSyxxxxx GOOGLE_MODEL=gemini-2.5-flash GOOGLE_TEMPERATURE=0.7 GOOGLE_MAX_TOKENS=1000 # Advanced parameters (optional) GOOGLE_TOP_P=0.95 GOOGLE_TOP_K=40 ``` #### Temperature Clamping ```python # Auto-clamping to Google's range google_temp = max(0.0, min(1.0, configured_temperature)) if google_temp != configured_temperature: logger.info(f"Clamping temperature from {configured_temperature} to {google_temp}") ``` #### Parameter Constraints - **Temperature**: `0.0 - 1.0` (strictly enforced) - **Max Tokens**: `1 - 8192` - **Top K**: `1 - 40` ### Ollama Configuration #### Environment Variables ```bash # Core settings OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_MODEL=llama3.1:8b OLLAMA_TEMPERATURE=0.7 OLLAMA_MAX_TOKENS=1000 # Connection settings OLLAMA_TIMEOUT=30 OLLAMA_KEEP_ALIVE=5m ``` #### Service Management ```bash # Start Ollama service ollama serve & # Verify service status curl http://localhost:11434/api/version # Model management ollama pull llama3.1:8b ollama list ollama rm unused_model ``` #### Parameter Flexibility - **Temperature**: `0.0 - 2.0` (widest range) - **Context Length**: Model-dependent (2K - 128K) - **Custom Parameters**: Model-specific options available ### HuggingFace Configuration #### Environment Variables ```bash # Core settings HUGGINGFACE_API_KEY=hf_xxxxx HUGGINGFACE_MODEL=microsoft/DialoGPT-medium HUGGINGFACE_TEMPERATURE=0.7 # Often ignored HUGGINGFACE_MAX_TOKENS=500 # API settings HUGGINGFACE_WAIT_FOR_MODEL=true HUGGINGFACE_USE_CACHE=true ``` #### API Limitations ```python # Note: Temperature is often ignored by Inference API logger.warning(f"HuggingFace model {model_name} may ignore temperature setting") return 0.7 # API typically uses this default ``` ## โš™๏ธ Advanced Configuration ### Dynamic Provider Switching ```python # config/settings.py implementation def get_llm_config(): provider = os.getenv("LLM_PROVIDER", "openai").lower() fallback = os.getenv("LLM_FALLBACK_PROVIDER", "google").lower() return { "provider": provider, "fallback_provider": fallback, **get_provider_config(provider) } def get_provider_config(provider): """Get provider-specific configuration.""" configs = { "openai": { "api_key": os.getenv("OPENAI_API_KEY"), "model": os.getenv("OPENAI_MODEL", "gpt-4o-mini"), "temperature": float(os.getenv("OPENAI_TEMPERATURE", "0.7")), "max_tokens": int(os.getenv("OPENAI_MAX_TOKENS", "1000")), }, "google": { "api_key": os.getenv("GOOGLE_API_KEY"), "model": os.getenv("GOOGLE_MODEL", "gemini-2.5-flash"), "temperature": float(os.getenv("GOOGLE_TEMPERATURE", "0.7")), "max_tokens": int(os.getenv("GOOGLE_MAX_TOKENS", "1000")), }, # ... other providers } return configs.get(provider, {}) ``` ### Fallback Configuration ```python # Automatic fallback on provider failure def get_llm_response(message): try: return primary_provider.chat_completion(message) except Exception as e: logger.warning(f"Primary provider failed: {e}") return fallback_provider.chat_completion(message) ``` ### Environment-Specific Configs #### Development (.env.development) ```bash # Fast, free/cheap for testing LLM_PROVIDER=google GOOGLE_MODEL=gemini-2.5-flash GOOGLE_TEMPERATURE=0.8 # More creative for testing LLM_FALLBACK_PROVIDER=ollama ``` #### Production (.env.production) ```bash # Reliable, consistent for production LLM_PROVIDER=openai OPENAI_MODEL=gpt-4o-mini OPENAI_TEMPERATURE=0.7 # Consistent responses LLM_FALLBACK_PROVIDER=google ``` #### Local Development (.env.local) ```bash # Self-hosted for offline development LLM_PROVIDER=ollama OLLAMA_MODEL=llama3.1:8b OLLAMA_TEMPERATURE=0.7 # No fallback - fully local ``` ## ๐Ÿšจ Configuration Troubleshooting ### Issue: GPT-5-nano Temperature Error **Error**: `Temperature must be 1.0 for gpt-5-nano` **Status**: โœ… Auto-fixed in `services/llm_service.py` **Verification**: ```bash python -c " import os os.environ['OPENAI_MODEL'] = 'gpt-5-nano' os.environ['OPENAI_TEMPERATURE'] = '0.5' from services.llm_service import LLMService LLMService() # Should log temperature override " ``` ### Issue: Google Temperature Out of Range **Error**: `Temperature must be between 0.0 and 1.0` **Solution**: Automatic clamping implemented **Test**: ```bash python -c " import os os.environ['LLM_PROVIDER'] = 'google' os.environ['GOOGLE_TEMPERATURE'] = '1.5' from services.llm_service import LLMService LLMService() # Should clamp to 1.0 " ``` ### Issue: Ollama Connection Failed **Error**: `ConnectionError: Could not connect to Ollama` **Diagnosis**: ```bash # Check if Ollama is running curl -f http://localhost:11434/api/version || echo "Ollama not running" # Check if model exists ollama list | grep "llama3.1:8b" || echo "Model not found" # Check system resources free -h # RAM usage df -h # Disk space ``` **Fix**: ```bash # Start Ollama service ollama serve & # Pull required model ollama pull llama3.1:8b # Test connection curl -d '{"model":"llama3.1:8b","prompt":"test","stream":false}' \ http://localhost:11434/api/generate ``` ### Issue: HuggingFace Temperature Ignored **Issue**: Settings have no effect on response **Explanation**: This is expected behavior - HuggingFace Inference API typically ignores temperature **Workaround**: Use different models or providers for temperature control ### Issue: Missing API Keys **Error**: `AuthenticationError: Invalid API key` **Diagnosis**: ```bash # Check environment variables echo "OpenAI: ${OPENAI_API_KEY:0:10}..." echo "Google: ${GOOGLE_API_KEY:0:10}..." echo "HuggingFace: ${HUGGINGFACE_API_KEY:0:10}..." # Test API key validity curl -H "Authorization: Bearer $OPENAI_API_KEY" \ https://api.openai.com/v1/models | jq '.data[0].id' || echo "Invalid OpenAI key" ``` ## ๐Ÿ” Configuration Validation ### Automated Configuration Check ```bash # Run comprehensive configuration validation python -c " from config.settings import get_llm_config from services.llm_service import LLMService import json print('๐Ÿ”ง Configuration Validation') print('=' * 40) # Load configuration try: config = get_llm_config() print('โœ… Configuration loaded successfully') print(f'Provider: {config.get(\"provider\")}') print(f'Model: {config.get(\"model\")}') print(f'Temperature: {config.get(\"temperature\")}') except Exception as e: print(f'โŒ Configuration error: {e}') exit(1) # Test service initialization try: service = LLMService() print('โœ… LLM Service initialized') except Exception as e: print(f'โŒ Service initialization failed: {e}') exit(1) # Test simple completion try: response = service.simple_chat_completion('Test message') print('โœ… Chat completion successful') print(f'Response length: {len(response)} characters') except Exception as e: print(f'โŒ Chat completion failed: {e}') exit(1) print('๐ŸŽ‰ All configuration checks passed!') " ``` ### Provider-Specific Health Checks ```bash # OpenAI health check curl -H "Authorization: Bearer $OPENAI_API_KEY" \ https://api.openai.com/v1/models | jq '.data | length' # Google health check curl "https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEY" | jq '.models | length' # Ollama health check curl http://localhost:11434/api/tags | jq '.models | length' # HuggingFace health check curl -H "Authorization: Bearer $HUGGINGFACE_API_KEY" \ https://huggingface.co/api/whoami | jq '.name' ``` ### Configuration Diff Tool ```bash # Compare current config with defaults python -c " import os from config.settings import get_llm_config defaults = { 'openai': {'temperature': 0.7, 'max_tokens': 1000}, 'google': {'temperature': 0.7, 'max_tokens': 1000}, 'ollama': {'temperature': 0.7, 'max_tokens': 1000}, } current = get_llm_config() provider = current.get('provider') default = defaults.get(provider, {}) print(f'Configuration for {provider}:') for key, default_val in default.items(): current_val = current.get(key) status = 'โœ…' if current_val == default_val else 'โš ๏ธ' print(f'{status} {key}: {current_val} (default: {default_val})') " ``` ## ๐Ÿ“‹ Configuration Templates ### Minimal Setup (Single Provider) ```bash # .env.minimal LLM_PROVIDER=google GOOGLE_API_KEY=your_api_key GOOGLE_MODEL=gemini-2.5-flash ``` ### Robust Setup (Primary + Fallback) ```bash # .env.robust LLM_PROVIDER=openai OPENAI_API_KEY=your_primary_key OPENAI_MODEL=gpt-4o-mini LLM_FALLBACK_PROVIDER=google GOOGLE_API_KEY=your_fallback_key GOOGLE_MODEL=gemini-2.5-flash ``` ### Local-First Setup ```bash # .env.local-first LLM_PROVIDER=ollama OLLAMA_MODEL=llama3.1:8b LLM_FALLBACK_PROVIDER=google GOOGLE_API_KEY=your_cloud_backup_key ``` ### Budget-Conscious Setup ```bash # .env.budget LLM_PROVIDER=openai OPENAI_MODEL=gpt-5-nano OPENAI_TEMPERATURE=1.0 # Fixed for nano OPENAI_MAX_TOKENS=500 # Reduce costs ``` ## ๐Ÿ” Security Best Practices ### API Key Management ```bash # Use environment variables export OPENAI_API_KEY="sk-..." # Never commit keys to git echo "*.env*" >> .gitignore echo ".env" >> .gitignore # Use different keys for different environments cp .env.example .env.development cp .env.example .env.production ``` ### Rate Limiting Configuration ```python # Add to config/settings.py RATE_LIMITS = { "openai": {"rpm": 500, "tpm": 40000}, "google": {"rpm": 60, "tpm": 32000}, "ollama": {"rpm": None, "tpm": None}, # Local = unlimited } ``` ### Error Handling Strategy ```python # Graceful degradation configuration FALLBACK_CHAIN = [ "primary_provider", "fallback_provider", "local_provider", "cached_response" ] ``` ## ๐Ÿงช Testing Configuration Changes ### Unit Tests for Configuration ```bash # Test temperature overrides python -m pytest tests/test_llm_temperature.py -v # Test provider fallbacks python -m pytest tests/test_llm_fallback.py -v # Test API key validation python -m pytest tests/test_api_keys.py -v ``` ### Integration Tests ```bash # Test each provider individually python -c " import os providers = ['openai', 'google', 'ollama'] for provider in providers: os.environ['LLM_PROVIDER'] = provider try: from services.llm_service import LLMService service = LLMService() response = service.simple_chat_completion('Test') print(f'โœ… {provider}: {len(response)} chars') except Exception as e: print(f'โŒ {provider}: {e}') " ``` ### Performance Benchmarks ```bash # Measure response times python -c " import time from services.llm_service import LLMService service = LLMService() start = time.time() response = service.simple_chat_completion('Quick recipe suggestion') elapsed = time.time() - start print(f'Response time: {elapsed:.2f}s') print(f'Response length: {len(response)} characters') print(f'Words per second: {len(response.split()) / elapsed:.1f}') " ``` ## ๐Ÿ”„ Configuration Migration ### Upgrading from Old Configuration ```bash # Migrate old environment variables # Old format โ†’ New format mv .env .env.backup # Update variable names sed 's/LLM_MODEL=/OPENAI_MODEL=/' .env.backup > .env sed -i 's/LLM_TEMPERATURE=/OPENAI_TEMPERATURE=/' .env sed -i 's/LLM_MAX_TOKENS=/OPENAI_MAX_TOKENS=/' .env echo "LLM_PROVIDER=openai" >> .env ``` ### Version Compatibility Check ```python # Check if configuration is compatible def check_config_version(): required_vars = ["LLM_PROVIDER"] legacy_vars = ["LLM_MODEL", "LLM_TEMPERATURE"] has_new = all(os.getenv(var) for var in required_vars) has_legacy = any(os.getenv(var) for var in legacy_vars) if has_legacy and not has_new: raise ValueError("Legacy configuration detected. Please migrate to new format.") return has_new ``` --- ๐Ÿ’ก **Next Steps**: After configuring your providers, see the [Model Selection Guide](./model-selection-guide.md) for choosing the best models for your use case.