| # Model Configuration Guide | |
| This guide focuses on the technical configuration, settings management, parameter handling, and troubleshooting for LLM providers in the Recipe Chatbot project. | |
| > π **Looking for model recommendations?** See [Model Selection Guide](./model-selection-guide.md) for detailed model comparisons and use case recommendations. | |
| ## π§ Configuration System Overview | |
| ### Settings Architecture | |
| The project uses a centralized configuration system in `config/settings.py` with environment variable overrides: | |
| ```python | |
| # Configuration loading flow | |
| Environment Variables (.env) β settings.py β LLM Service β Provider APIs | |
| ``` | |
| ### Temperature Management | |
| Each provider has different temperature constraints that are automatically handled: | |
| | Provider | Range | Auto-Handling | Special Cases | | |
| |----------|-------|---------------|---------------| | |
| | **OpenAI** | 0.0 - 2.0 | β GPT-5-nano β 1.0 | Nano models fixed | | |
| | **Google** | 0.0 - 1.0 | β Clamp to range | Strict validation | | |
| | **Ollama** | 0.0 - 2.0 | β οΈ Model dependent | Local processing | | |
| | **HuggingFace** | Fixed ~0.7 | β API ignores setting | Read-only | | |
| ## π οΈ Provider Configuration Details | |
| ### OpenAI Configuration | |
| #### Environment Variables | |
| ```bash | |
| # Core settings | |
| OPENAI_API_KEY=sk-proj-xxxxx | |
| OPENAI_MODEL=gpt-4o-mini | |
| OPENAI_TEMPERATURE=0.7 | |
| OPENAI_MAX_TOKENS=1000 | |
| # Advanced parameters (optional) | |
| OPENAI_TOP_P=1.0 | |
| OPENAI_FREQUENCY_PENALTY=0.0 | |
| OPENAI_PRESENCE_PENALTY=0.0 | |
| ``` | |
| #### Automatic Temperature Override | |
| ```python | |
| # Implemented in services/llm_service.py | |
| if "gpt-5-nano" in model_name.lower(): | |
| temperature = 1.0 # Only supported value | |
| logger.info(f"Auto-adjusting temperature to 1.0 for {model_name}") | |
| ``` | |
| #### Parameter Validation | |
| - **Temperature**: `0.0 - 2.0` (except nano models: fixed `1.0`) | |
| - **Max Tokens**: `1 - 4096` (model-dependent) | |
| - **Top P**: `0.0 - 1.0` | |
| ### Google (Gemini) Configuration | |
| #### Environment Variables | |
| ```bash | |
| # Core settings | |
| GOOGLE_API_KEY=AIzaSyxxxxx | |
| GOOGLE_MODEL=gemini-2.5-flash | |
| GOOGLE_TEMPERATURE=0.7 | |
| GOOGLE_MAX_TOKENS=1000 | |
| # Advanced parameters (optional) | |
| GOOGLE_TOP_P=0.95 | |
| GOOGLE_TOP_K=40 | |
| ``` | |
| #### Temperature Clamping | |
| ```python | |
| # Auto-clamping to Google's range | |
| google_temp = max(0.0, min(1.0, configured_temperature)) | |
| if google_temp != configured_temperature: | |
| logger.info(f"Clamping temperature from {configured_temperature} to {google_temp}") | |
| ``` | |
| #### Parameter Constraints | |
| - **Temperature**: `0.0 - 1.0` (strictly enforced) | |
| - **Max Tokens**: `1 - 8192` | |
| - **Top K**: `1 - 40` | |
| ### Ollama Configuration | |
| #### Environment Variables | |
| ```bash | |
| # Core settings | |
| OLLAMA_BASE_URL=http://localhost:11434 | |
| OLLAMA_MODEL=llama3.1:8b | |
| OLLAMA_TEMPERATURE=0.7 | |
| OLLAMA_MAX_TOKENS=1000 | |
| # Connection settings | |
| OLLAMA_TIMEOUT=30 | |
| OLLAMA_KEEP_ALIVE=5m | |
| ``` | |
| #### Service Management | |
| ```bash | |
| # Start Ollama service | |
| ollama serve & | |
| # Verify service status | |
| curl http://localhost:11434/api/version | |
| # Model management | |
| ollama pull llama3.1:8b | |
| ollama list | |
| ollama rm unused_model | |
| ``` | |
| #### Parameter Flexibility | |
| - **Temperature**: `0.0 - 2.0` (widest range) | |
| - **Context Length**: Model-dependent (2K - 128K) | |
| - **Custom Parameters**: Model-specific options available | |
| ### HuggingFace Configuration | |
| #### Environment Variables | |
| ```bash | |
| # Core settings | |
| HUGGINGFACE_API_KEY=hf_xxxxx | |
| HUGGINGFACE_MODEL=microsoft/DialoGPT-medium | |
| HUGGINGFACE_TEMPERATURE=0.7 # Often ignored | |
| HUGGINGFACE_MAX_TOKENS=500 | |
| # API settings | |
| HUGGINGFACE_WAIT_FOR_MODEL=true | |
| HUGGINGFACE_USE_CACHE=true | |
| ``` | |
| #### API Limitations | |
| ```python | |
| # Note: Temperature is often ignored by Inference API | |
| logger.warning(f"HuggingFace model {model_name} may ignore temperature setting") | |
| return 0.7 # API typically uses this default | |
| ``` | |
| ## βοΈ Advanced Configuration | |
| ### Dynamic Provider Switching | |
| ```python | |
| # config/settings.py implementation | |
| def get_llm_config(): | |
| provider = os.getenv("LLM_PROVIDER", "openai").lower() | |
| fallback = os.getenv("LLM_FALLBACK_PROVIDER", "google").lower() | |
| return { | |
| "provider": provider, | |
| "fallback_provider": fallback, | |
| **get_provider_config(provider) | |
| } | |
| def get_provider_config(provider): | |
| """Get provider-specific configuration.""" | |
| configs = { | |
| "openai": { | |
| "api_key": os.getenv("OPENAI_API_KEY"), | |
| "model": os.getenv("OPENAI_MODEL", "gpt-4o-mini"), | |
| "temperature": float(os.getenv("OPENAI_TEMPERATURE", "0.7")), | |
| "max_tokens": int(os.getenv("OPENAI_MAX_TOKENS", "1000")), | |
| }, | |
| "google": { | |
| "api_key": os.getenv("GOOGLE_API_KEY"), | |
| "model": os.getenv("GOOGLE_MODEL", "gemini-2.5-flash"), | |
| "temperature": float(os.getenv("GOOGLE_TEMPERATURE", "0.7")), | |
| "max_tokens": int(os.getenv("GOOGLE_MAX_TOKENS", "1000")), | |
| }, | |
| # ... other providers | |
| } | |
| return configs.get(provider, {}) | |
| ``` | |
| ### Fallback Configuration | |
| ```python | |
| # Automatic fallback on provider failure | |
| def get_llm_response(message): | |
| try: | |
| return primary_provider.chat_completion(message) | |
| except Exception as e: | |
| logger.warning(f"Primary provider failed: {e}") | |
| return fallback_provider.chat_completion(message) | |
| ``` | |
| ### Environment-Specific Configs | |
| #### Development (.env.development) | |
| ```bash | |
| # Fast, free/cheap for testing | |
| LLM_PROVIDER=google | |
| GOOGLE_MODEL=gemini-2.5-flash | |
| GOOGLE_TEMPERATURE=0.8 # More creative for testing | |
| LLM_FALLBACK_PROVIDER=ollama | |
| ``` | |
| #### Production (.env.production) | |
| ```bash | |
| # Reliable, consistent for production | |
| LLM_PROVIDER=openai | |
| OPENAI_MODEL=gpt-4o-mini | |
| OPENAI_TEMPERATURE=0.7 # Consistent responses | |
| LLM_FALLBACK_PROVIDER=google | |
| ``` | |
| #### Local Development (.env.local) | |
| ```bash | |
| # Self-hosted for offline development | |
| LLM_PROVIDER=ollama | |
| OLLAMA_MODEL=llama3.1:8b | |
| OLLAMA_TEMPERATURE=0.7 | |
| # No fallback - fully local | |
| ``` | |
| ## π¨ Configuration Troubleshooting | |
| ### Issue: GPT-5-nano Temperature Error | |
| **Error**: `Temperature must be 1.0 for gpt-5-nano` | |
| **Status**: β Auto-fixed in `services/llm_service.py` | |
| **Verification**: | |
| ```bash | |
| python -c " | |
| import os | |
| os.environ['OPENAI_MODEL'] = 'gpt-5-nano' | |
| os.environ['OPENAI_TEMPERATURE'] = '0.5' | |
| from services.llm_service import LLMService | |
| LLMService() # Should log temperature override | |
| " | |
| ``` | |
| ### Issue: Google Temperature Out of Range | |
| **Error**: `Temperature must be between 0.0 and 1.0` | |
| **Solution**: Automatic clamping implemented | |
| **Test**: | |
| ```bash | |
| python -c " | |
| import os | |
| os.environ['LLM_PROVIDER'] = 'google' | |
| os.environ['GOOGLE_TEMPERATURE'] = '1.5' | |
| from services.llm_service import LLMService | |
| LLMService() # Should clamp to 1.0 | |
| " | |
| ``` | |
| ### Issue: Ollama Connection Failed | |
| **Error**: `ConnectionError: Could not connect to Ollama` | |
| **Diagnosis**: | |
| ```bash | |
| # Check if Ollama is running | |
| curl -f http://localhost:11434/api/version || echo "Ollama not running" | |
| # Check if model exists | |
| ollama list | grep "llama3.1:8b" || echo "Model not found" | |
| # Check system resources | |
| free -h # RAM usage | |
| df -h # Disk space | |
| ``` | |
| **Fix**: | |
| ```bash | |
| # Start Ollama service | |
| ollama serve & | |
| # Pull required model | |
| ollama pull llama3.1:8b | |
| # Test connection | |
| curl -d '{"model":"llama3.1:8b","prompt":"test","stream":false}' \ | |
| http://localhost:11434/api/generate | |
| ``` | |
| ### Issue: HuggingFace Temperature Ignored | |
| **Issue**: Settings have no effect on response | |
| **Explanation**: This is expected behavior - HuggingFace Inference API typically ignores temperature | |
| **Workaround**: Use different models or providers for temperature control | |
| ### Issue: Missing API Keys | |
| **Error**: `AuthenticationError: Invalid API key` | |
| **Diagnosis**: | |
| ```bash | |
| # Check environment variables | |
| echo "OpenAI: ${OPENAI_API_KEY:0:10}..." | |
| echo "Google: ${GOOGLE_API_KEY:0:10}..." | |
| echo "HuggingFace: ${HUGGINGFACE_API_KEY:0:10}..." | |
| # Test API key validity | |
| curl -H "Authorization: Bearer $OPENAI_API_KEY" \ | |
| https://api.openai.com/v1/models | jq '.data[0].id' || echo "Invalid OpenAI key" | |
| ``` | |
| ## π Configuration Validation | |
| ### Automated Configuration Check | |
| ```bash | |
| # Run comprehensive configuration validation | |
| python -c " | |
| from config.settings import get_llm_config | |
| from services.llm_service import LLMService | |
| import json | |
| print('π§ Configuration Validation') | |
| print('=' * 40) | |
| # Load configuration | |
| try: | |
| config = get_llm_config() | |
| print('β Configuration loaded successfully') | |
| print(f'Provider: {config.get(\"provider\")}') | |
| print(f'Model: {config.get(\"model\")}') | |
| print(f'Temperature: {config.get(\"temperature\")}') | |
| except Exception as e: | |
| print(f'β Configuration error: {e}') | |
| exit(1) | |
| # Test service initialization | |
| try: | |
| service = LLMService() | |
| print('β LLM Service initialized') | |
| except Exception as e: | |
| print(f'β Service initialization failed: {e}') | |
| exit(1) | |
| # Test simple completion | |
| try: | |
| response = service.simple_chat_completion('Test message') | |
| print('β Chat completion successful') | |
| print(f'Response length: {len(response)} characters') | |
| except Exception as e: | |
| print(f'β Chat completion failed: {e}') | |
| exit(1) | |
| print('π All configuration checks passed!') | |
| " | |
| ``` | |
| ### Provider-Specific Health Checks | |
| ```bash | |
| # OpenAI health check | |
| curl -H "Authorization: Bearer $OPENAI_API_KEY" \ | |
| https://api.openai.com/v1/models | jq '.data | length' | |
| # Google health check | |
| curl "https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEY" | jq '.models | length' | |
| # Ollama health check | |
| curl http://localhost:11434/api/tags | jq '.models | length' | |
| # HuggingFace health check | |
| curl -H "Authorization: Bearer $HUGGINGFACE_API_KEY" \ | |
| https://huggingface.co/api/whoami | jq '.name' | |
| ``` | |
| ### Configuration Diff Tool | |
| ```bash | |
| # Compare current config with defaults | |
| python -c " | |
| import os | |
| from config.settings import get_llm_config | |
| defaults = { | |
| 'openai': {'temperature': 0.7, 'max_tokens': 1000}, | |
| 'google': {'temperature': 0.7, 'max_tokens': 1000}, | |
| 'ollama': {'temperature': 0.7, 'max_tokens': 1000}, | |
| } | |
| current = get_llm_config() | |
| provider = current.get('provider') | |
| default = defaults.get(provider, {}) | |
| print(f'Configuration for {provider}:') | |
| for key, default_val in default.items(): | |
| current_val = current.get(key) | |
| status = 'β ' if current_val == default_val else 'β οΈ' | |
| print(f'{status} {key}: {current_val} (default: {default_val})') | |
| " | |
| ``` | |
| ## π Configuration Templates | |
| ### Minimal Setup (Single Provider) | |
| ```bash | |
| # .env.minimal | |
| LLM_PROVIDER=google | |
| GOOGLE_API_KEY=your_api_key | |
| GOOGLE_MODEL=gemini-2.5-flash | |
| ``` | |
| ### Robust Setup (Primary + Fallback) | |
| ```bash | |
| # .env.robust | |
| LLM_PROVIDER=openai | |
| OPENAI_API_KEY=your_primary_key | |
| OPENAI_MODEL=gpt-4o-mini | |
| LLM_FALLBACK_PROVIDER=google | |
| GOOGLE_API_KEY=your_fallback_key | |
| GOOGLE_MODEL=gemini-2.5-flash | |
| ``` | |
| ### Local-First Setup | |
| ```bash | |
| # .env.local-first | |
| LLM_PROVIDER=ollama | |
| OLLAMA_MODEL=llama3.1:8b | |
| LLM_FALLBACK_PROVIDER=google | |
| GOOGLE_API_KEY=your_cloud_backup_key | |
| ``` | |
| ### Budget-Conscious Setup | |
| ```bash | |
| # .env.budget | |
| LLM_PROVIDER=openai | |
| OPENAI_MODEL=gpt-5-nano | |
| OPENAI_TEMPERATURE=1.0 # Fixed for nano | |
| OPENAI_MAX_TOKENS=500 # Reduce costs | |
| ``` | |
| ## π Security Best Practices | |
| ### API Key Management | |
| ```bash | |
| # Use environment variables | |
| export OPENAI_API_KEY="sk-..." | |
| # Never commit keys to git | |
| echo "*.env*" >> .gitignore | |
| echo ".env" >> .gitignore | |
| # Use different keys for different environments | |
| cp .env.example .env.development | |
| cp .env.example .env.production | |
| ``` | |
| ### Rate Limiting Configuration | |
| ```python | |
| # Add to config/settings.py | |
| RATE_LIMITS = { | |
| "openai": {"rpm": 500, "tpm": 40000}, | |
| "google": {"rpm": 60, "tpm": 32000}, | |
| "ollama": {"rpm": None, "tpm": None}, # Local = unlimited | |
| } | |
| ``` | |
| ### Error Handling Strategy | |
| ```python | |
| # Graceful degradation configuration | |
| FALLBACK_CHAIN = [ | |
| "primary_provider", | |
| "fallback_provider", | |
| "local_provider", | |
| "cached_response" | |
| ] | |
| ``` | |
| ## π§ͺ Testing Configuration Changes | |
| ### Unit Tests for Configuration | |
| ```bash | |
| # Test temperature overrides | |
| python -m pytest tests/test_llm_temperature.py -v | |
| # Test provider fallbacks | |
| python -m pytest tests/test_llm_fallback.py -v | |
| # Test API key validation | |
| python -m pytest tests/test_api_keys.py -v | |
| ``` | |
| ### Integration Tests | |
| ```bash | |
| # Test each provider individually | |
| python -c " | |
| import os | |
| providers = ['openai', 'google', 'ollama'] | |
| for provider in providers: | |
| os.environ['LLM_PROVIDER'] = provider | |
| try: | |
| from services.llm_service import LLMService | |
| service = LLMService() | |
| response = service.simple_chat_completion('Test') | |
| print(f'β {provider}: {len(response)} chars') | |
| except Exception as e: | |
| print(f'β {provider}: {e}') | |
| " | |
| ``` | |
| ### Performance Benchmarks | |
| ```bash | |
| # Measure response times | |
| python -c " | |
| import time | |
| from services.llm_service import LLMService | |
| service = LLMService() | |
| start = time.time() | |
| response = service.simple_chat_completion('Quick recipe suggestion') | |
| elapsed = time.time() - start | |
| print(f'Response time: {elapsed:.2f}s') | |
| print(f'Response length: {len(response)} characters') | |
| print(f'Words per second: {len(response.split()) / elapsed:.1f}') | |
| " | |
| ``` | |
| ## π Configuration Migration | |
| ### Upgrading from Old Configuration | |
| ```bash | |
| # Migrate old environment variables | |
| # Old format β New format | |
| mv .env .env.backup | |
| # Update variable names | |
| sed 's/LLM_MODEL=/OPENAI_MODEL=/' .env.backup > .env | |
| sed -i 's/LLM_TEMPERATURE=/OPENAI_TEMPERATURE=/' .env | |
| sed -i 's/LLM_MAX_TOKENS=/OPENAI_MAX_TOKENS=/' .env | |
| echo "LLM_PROVIDER=openai" >> .env | |
| ``` | |
| ### Version Compatibility Check | |
| ```python | |
| # Check if configuration is compatible | |
| def check_config_version(): | |
| required_vars = ["LLM_PROVIDER"] | |
| legacy_vars = ["LLM_MODEL", "LLM_TEMPERATURE"] | |
| has_new = all(os.getenv(var) for var in required_vars) | |
| has_legacy = any(os.getenv(var) for var in legacy_vars) | |
| if has_legacy and not has_new: | |
| raise ValueError("Legacy configuration detected. Please migrate to new format.") | |
| return has_new | |
| ``` | |
| --- | |
| π‘ **Next Steps**: After configuring your providers, see the [Model Selection Guide](./model-selection-guide.md) for choosing the best models for your use case. | |