Spaces:

jessejohnson
/

plg4-dev-server

Paused

App Files Files Community

plg4-dev-server / backend /docs /model-configuration-guide.md

Jesse Johnson

New commit for backend deployment: 2025-09-25_13-24-03

c59d808 5 months ago

preview code

raw

history blame contribute delete

14.1 kB

	# Model Configuration Guide

	This guide focuses on the technical configuration, settings management, parameter handling, and troubleshooting for LLM providers in the Recipe Chatbot project.

	> 📚 Looking for model recommendations? See [Model Selection Guide](./model-selection-guide.md) for detailed model comparisons and use case recommendations.

	## 🔧 Configuration System Overview

	### Settings Architecture
	The project uses a centralized configuration system in `config/settings.py` with environment variable overrides:

	```python
	# Configuration loading flow
	Environment Variables (.env) → settings.py → LLM Service → Provider APIs
	```

	### Temperature Management
	Each provider has different temperature constraints that are automatically handled:

	\| Provider \| Range \| Auto-Handling \| Special Cases \|
	\|----------\|-------\|---------------\|---------------\|
	\| OpenAI \| 0.0 - 2.0 \| ✅ GPT-5-nano → 1.0 \| Nano models fixed \|
	\| Google \| 0.0 - 1.0 \| ✅ Clamp to range \| Strict validation \|
	\| Ollama \| 0.0 - 2.0 \| ⚠️ Model dependent \| Local processing \|
	\| HuggingFace \| Fixed ~0.7 \| ❌ API ignores setting \| Read-only \|

	## 🛠️ Provider Configuration Details

	### OpenAI Configuration

	#### Environment Variables
	```bash
	# Core settings
	OPENAI_API_KEY=sk-proj-xxxxx
	OPENAI_MODEL=gpt-4o-mini
	OPENAI_TEMPERATURE=0.7
	OPENAI_MAX_TOKENS=1000

	# Advanced parameters (optional)
	OPENAI_TOP_P=1.0
	OPENAI_FREQUENCY_PENALTY=0.0
	OPENAI_PRESENCE_PENALTY=0.0
	```

	#### Automatic Temperature Override
	```python
	# Implemented in services/llm_service.py
	if "gpt-5-nano" in model_name.lower():
	temperature = 1.0 # Only supported value
	logger.info(f"Auto-adjusting temperature to 1.0 for {model_name}")
	```

	#### Parameter Validation
	- Temperature: `0.0 - 2.0` (except nano models: fixed `1.0`)
	- Max Tokens: `1 - 4096` (model-dependent)
	- Top P: `0.0 - 1.0`

	### Google (Gemini) Configuration

	#### Environment Variables
	```bash
	# Core settings
	GOOGLE_API_KEY=AIzaSyxxxxx
	GOOGLE_MODEL=gemini-2.5-flash
	GOOGLE_TEMPERATURE=0.7
	GOOGLE_MAX_TOKENS=1000

	# Advanced parameters (optional)
	GOOGLE_TOP_P=0.95
	GOOGLE_TOP_K=40
	```

	#### Temperature Clamping
	```python
	# Auto-clamping to Google's range
	google_temp = max(0.0, min(1.0, configured_temperature))
	if google_temp != configured_temperature:
	logger.info(f"Clamping temperature from {configured_temperature} to {google_temp}")
	```

	#### Parameter Constraints
	- Temperature: `0.0 - 1.0` (strictly enforced)
	- Max Tokens: `1 - 8192`
	- Top K: `1 - 40`

	### Ollama Configuration

	#### Environment Variables
	```bash
	# Core settings
	OLLAMA_BASE_URL=http://localhost:11434
	OLLAMA_MODEL=llama3.1:8b
	OLLAMA_TEMPERATURE=0.7
	OLLAMA_MAX_TOKENS=1000

	# Connection settings
	OLLAMA_TIMEOUT=30
	OLLAMA_KEEP_ALIVE=5m
	```

	#### Service Management
	```bash
	# Start Ollama service
	ollama serve &

	# Verify service status
	curl http://localhost:11434/api/version

	# Model management
	ollama pull llama3.1:8b
	ollama list
	ollama rm unused_model
	```

	#### Parameter Flexibility
	- Temperature: `0.0 - 2.0` (widest range)
	- Context Length: Model-dependent (2K - 128K)
	- Custom Parameters: Model-specific options available

	### HuggingFace Configuration

	#### Environment Variables
	```bash
	# Core settings
	HUGGINGFACE_API_KEY=hf_xxxxx
	HUGGINGFACE_MODEL=microsoft/DialoGPT-medium
	HUGGINGFACE_TEMPERATURE=0.7 # Often ignored
	HUGGINGFACE_MAX_TOKENS=500

	# API settings
	HUGGINGFACE_WAIT_FOR_MODEL=true
	HUGGINGFACE_USE_CACHE=true
	```

	#### API Limitations
	```python
	# Note: Temperature is often ignored by Inference API
	logger.warning(f"HuggingFace model {model_name} may ignore temperature setting")
	return 0.7 # API typically uses this default
	```

	## ⚙️ Advanced Configuration

	### Dynamic Provider Switching
	```python
	# config/settings.py implementation
	def get_llm_config():
	provider = os.getenv("LLM_PROVIDER", "openai").lower()
	fallback = os.getenv("LLM_FALLBACK_PROVIDER", "google").lower()

	return {
	"provider": provider,
	"fallback_provider": fallback,
	**get_provider_config(provider)
	}

	def get_provider_config(provider):
	"""Get provider-specific configuration."""
	configs = {
	"openai": {
	"api_key": os.getenv("OPENAI_API_KEY"),
	"model": os.getenv("OPENAI_MODEL", "gpt-4o-mini"),
	"temperature": float(os.getenv("OPENAI_TEMPERATURE", "0.7")),
	"max_tokens": int(os.getenv("OPENAI_MAX_TOKENS", "1000")),
	},
	"google": {
	"api_key": os.getenv("GOOGLE_API_KEY"),
	"model": os.getenv("GOOGLE_MODEL", "gemini-2.5-flash"),
	"temperature": float(os.getenv("GOOGLE_TEMPERATURE", "0.7")),
	"max_tokens": int(os.getenv("GOOGLE_MAX_TOKENS", "1000")),
	},
	# ... other providers
	}
	return configs.get(provider, {})
	```

	### Fallback Configuration
	```python
	# Automatic fallback on provider failure
	def get_llm_response(message):
	try:
	return primary_provider.chat_completion(message)
	except Exception as e:
	logger.warning(f"Primary provider failed: {e}")
	return fallback_provider.chat_completion(message)
	```

	### Environment-Specific Configs

	#### Development (.env.development)
	```bash
	# Fast, free/cheap for testing
	LLM_PROVIDER=google
	GOOGLE_MODEL=gemini-2.5-flash
	GOOGLE_TEMPERATURE=0.8 # More creative for testing
	LLM_FALLBACK_PROVIDER=ollama
	```

	#### Production (.env.production)
	```bash
	# Reliable, consistent for production
	LLM_PROVIDER=openai
	OPENAI_MODEL=gpt-4o-mini
	OPENAI_TEMPERATURE=0.7 # Consistent responses
	LLM_FALLBACK_PROVIDER=google
	```

	#### Local Development (.env.local)
	```bash
	# Self-hosted for offline development
	LLM_PROVIDER=ollama
	OLLAMA_MODEL=llama3.1:8b
	OLLAMA_TEMPERATURE=0.7
	# No fallback - fully local
	```

	## 🚨 Configuration Troubleshooting

	### Issue: GPT-5-nano Temperature Error
	Error: `Temperature must be 1.0 for gpt-5-nano`
	Status: ✅ Auto-fixed in `services/llm_service.py`
	Verification:
	```bash
	python -c "
	import os
	os.environ['OPENAI_MODEL'] = 'gpt-5-nano'
	os.environ['OPENAI_TEMPERATURE'] = '0.5'
	from services.llm_service import LLMService
	LLMService() # Should log temperature override
	"
	```

	### Issue: Google Temperature Out of Range
	Error: `Temperature must be between 0.0 and 1.0`
	Solution: Automatic clamping implemented
	Test:
	```bash
	python -c "
	import os
	os.environ['LLM_PROVIDER'] = 'google'
	os.environ['GOOGLE_TEMPERATURE'] = '1.5'
	from services.llm_service import LLMService
	LLMService() # Should clamp to 1.0
	"
	```

	### Issue: Ollama Connection Failed
	Error: `ConnectionError: Could not connect to Ollama`
	Diagnosis:
	```bash
	# Check if Ollama is running
	curl -f http://localhost:11434/api/version \|\| echo "Ollama not running"

	# Check if model exists
	ollama list \| grep "llama3.1:8b" \|\| echo "Model not found"

	# Check system resources
	free -h # RAM usage
	df -h # Disk space
	```

	Fix:
	```bash
	# Start Ollama service
	ollama serve &

	# Pull required model
	ollama pull llama3.1:8b

	# Test connection
	curl -d '{"model":"llama3.1:8b","prompt":"test","stream":false}' \
	http://localhost:11434/api/generate
	```

	### Issue: HuggingFace Temperature Ignored
	Issue: Settings have no effect on response
	Explanation: This is expected behavior - HuggingFace Inference API typically ignores temperature
	Workaround: Use different models or providers for temperature control

	### Issue: Missing API Keys
	Error: `AuthenticationError: Invalid API key`
	Diagnosis:
	```bash
	# Check environment variables
	echo "OpenAI: ${OPENAI_API_KEY:0:10}..."
	echo "Google: ${GOOGLE_API_KEY:0:10}..."
	echo "HuggingFace: ${HUGGINGFACE_API_KEY:0:10}..."

	# Test API key validity
	curl -H "Authorization: Bearer $OPENAI_API_KEY" \
	https://api.openai.com/v1/models \| jq '.data[0].id' \|\| echo "Invalid OpenAI key"
	```

	## 🔍 Configuration Validation

	### Automated Configuration Check
	```bash
	# Run comprehensive configuration validation
	python -c "
	from config.settings import get_llm_config
	from services.llm_service import LLMService
	import json

	print('🔧 Configuration Validation')
	print('=' * 40)

	# Load configuration
	try:
	config = get_llm_config()
	print('✅ Configuration loaded successfully')
	print(f'Provider: {config.get(\"provider\")}')
	print(f'Model: {config.get(\"model\")}')
	print(f'Temperature: {config.get(\"temperature\")}')
	except Exception as e:
	print(f'❌ Configuration error: {e}')
	exit(1)

	# Test service initialization
	try:
	service = LLMService()
	print('✅ LLM Service initialized')
	except Exception as e:
	print(f'❌ Service initialization failed: {e}')
	exit(1)

	# Test simple completion
	try:
	response = service.simple_chat_completion('Test message')
	print('✅ Chat completion successful')
	print(f'Response length: {len(response)} characters')
	except Exception as e:
	print(f'❌ Chat completion failed: {e}')
	exit(1)

	print('🎉 All configuration checks passed!')
	"
	```

	### Provider-Specific Health Checks
	```bash
	# OpenAI health check
	curl -H "Authorization: Bearer $OPENAI_API_KEY" \
	https://api.openai.com/v1/models \| jq '.data \| length'

	# Google health check
	curl "https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEY" \| jq '.models \| length'

	# Ollama health check
	curl http://localhost:11434/api/tags \| jq '.models \| length'

	# HuggingFace health check
	curl -H "Authorization: Bearer $HUGGINGFACE_API_KEY" \
	https://huggingface.co/api/whoami \| jq '.name'
	```

	### Configuration Diff Tool
	```bash
	# Compare current config with defaults
	python -c "
	import os
	from config.settings import get_llm_config

	defaults = {
	'openai': {'temperature': 0.7, 'max_tokens': 1000},
	'google': {'temperature': 0.7, 'max_tokens': 1000},
	'ollama': {'temperature': 0.7, 'max_tokens': 1000},
	}

	current = get_llm_config()
	provider = current.get('provider')
	default = defaults.get(provider, {})

	print(f'Configuration for {provider}:')
	for key, default_val in default.items():
	current_val = current.get(key)
	status = '✅' if current_val == default_val else '⚠️'
	print(f'{status} {key}: {current_val} (default: {default_val})')
	"
	```

	## 📋 Configuration Templates

	### Minimal Setup (Single Provider)
	```bash
	# .env.minimal
	LLM_PROVIDER=google
	GOOGLE_API_KEY=your_api_key
	GOOGLE_MODEL=gemini-2.5-flash
	```

	### Robust Setup (Primary + Fallback)
	```bash
	# .env.robust
	LLM_PROVIDER=openai
	OPENAI_API_KEY=your_primary_key
	OPENAI_MODEL=gpt-4o-mini
	LLM_FALLBACK_PROVIDER=google
	GOOGLE_API_KEY=your_fallback_key
	GOOGLE_MODEL=gemini-2.5-flash
	```

	### Local-First Setup
	```bash
	# .env.local-first
	LLM_PROVIDER=ollama
	OLLAMA_MODEL=llama3.1:8b
	LLM_FALLBACK_PROVIDER=google
	GOOGLE_API_KEY=your_cloud_backup_key
	```

	### Budget-Conscious Setup
	```bash
	# .env.budget
	LLM_PROVIDER=openai
	OPENAI_MODEL=gpt-5-nano
	OPENAI_TEMPERATURE=1.0 # Fixed for nano
	OPENAI_MAX_TOKENS=500 # Reduce costs
	```

	## 🔐 Security Best Practices

	### API Key Management
	```bash
	# Use environment variables
	export OPENAI_API_KEY="sk-..."

	# Never commit keys to git
	echo ".env" >> .gitignore
	echo ".env" >> .gitignore

	# Use different keys for different environments
	cp .env.example .env.development
	cp .env.example .env.production
	```

	### Rate Limiting Configuration
	```python
	# Add to config/settings.py
	RATE_LIMITS = {
	"openai": {"rpm": 500, "tpm": 40000},
	"google": {"rpm": 60, "tpm": 32000},
	"ollama": {"rpm": None, "tpm": None}, # Local = unlimited
	}
	```

	### Error Handling Strategy
	```python
	# Graceful degradation configuration
	FALLBACK_CHAIN = [
	"primary_provider",
	"fallback_provider",
	"local_provider",
	"cached_response"
	]
	```

	## 🧪 Testing Configuration Changes

	### Unit Tests for Configuration
	```bash
	# Test temperature overrides
	python -m pytest tests/test_llm_temperature.py -v

	# Test provider fallbacks
	python -m pytest tests/test_llm_fallback.py -v

	# Test API key validation
	python -m pytest tests/test_api_keys.py -v
	```

	### Integration Tests
	```bash
	# Test each provider individually
	python -c "
	import os
	providers = ['openai', 'google', 'ollama']

	for provider in providers:
	os.environ['LLM_PROVIDER'] = provider
	try:
	from services.llm_service import LLMService
	service = LLMService()
	response = service.simple_chat_completion('Test')
	print(f'✅ {provider}: {len(response)} chars')
	except Exception as e:
	print(f'❌ {provider}: {e}')
	"
	```

	### Performance Benchmarks
	```bash
	# Measure response times
	python -c "
	import time
	from services.llm_service import LLMService

	service = LLMService()
	start = time.time()
	response = service.simple_chat_completion('Quick recipe suggestion')
	elapsed = time.time() - start

	print(f'Response time: {elapsed:.2f}s')
	print(f'Response length: {len(response)} characters')
	print(f'Words per second: {len(response.split()) / elapsed:.1f}')
	"
	```

	## 🔄 Configuration Migration

	### Upgrading from Old Configuration
	```bash
	# Migrate old environment variables
	# Old format → New format
	mv .env .env.backup

	# Update variable names
	sed 's/LLM_MODEL=/OPENAI_MODEL=/' .env.backup > .env
	sed -i 's/LLM_TEMPERATURE=/OPENAI_TEMPERATURE=/' .env
	sed -i 's/LLM_MAX_TOKENS=/OPENAI_MAX_TOKENS=/' .env

	echo "LLM_PROVIDER=openai" >> .env
	```

	### Version Compatibility Check
	```python
	# Check if configuration is compatible
	def check_config_version():
	required_vars = ["LLM_PROVIDER"]
	legacy_vars = ["LLM_MODEL", "LLM_TEMPERATURE"]

	has_new = all(os.getenv(var) for var in required_vars)
	has_legacy = any(os.getenv(var) for var in legacy_vars)

	if has_legacy and not has_new:
	raise ValueError("Legacy configuration detected. Please migrate to new format.")

	return has_new
	```

	---

	💡 Next Steps: After configuring your providers, see the [Model Selection Guide](./model-selection-guide.md) for choosing the best models for your use case.