Spaces:

jessejohnson
/

plg4-dev-server

Paused

App Files Files Community

plg4-dev-server / backend /docs /model-configuration-guide.md

Jesse Johnson

New commit for backend deployment: 2025-09-25_13-24-03

c59d808 5 months ago

preview code

raw

history blame contribute delete

14.1 kB

Model Configuration Guide

This guide focuses on the technical configuration, settings management, parameter handling, and troubleshooting for LLM providers in the Recipe Chatbot project.

📚 Looking for model recommendations? See Model Selection Guide for detailed model comparisons and use case recommendations.

🔧 Configuration System Overview

Settings Architecture

The project uses a centralized configuration system in config/settings.py with environment variable overrides:

# Configuration loading flow
Environment Variables (.env) → settings.py → LLM Service → Provider APIs

Temperature Management

Each provider has different temperature constraints that are automatically handled:

Provider	Range	Auto-Handling	Special Cases
OpenAI	0.0 - 2.0	✅ GPT-5-nano → 1.0	Nano models fixed
Google	0.0 - 1.0	✅ Clamp to range	Strict validation
Ollama	0.0 - 2.0	⚠️ Model dependent	Local processing
HuggingFace	Fixed ~0.7	❌ API ignores setting	Read-only

🛠️ Provider Configuration Details

OpenAI Configuration

Environment Variables

# Core settings
OPENAI_API_KEY=sk-proj-xxxxx
OPENAI_MODEL=gpt-4o-mini
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000

# Advanced parameters (optional)
OPENAI_TOP_P=1.0
OPENAI_FREQUENCY_PENALTY=0.0
OPENAI_PRESENCE_PENALTY=0.0

Automatic Temperature Override

# Implemented in services/llm_service.py
if "gpt-5-nano" in model_name.lower():
    temperature = 1.0  # Only supported value
    logger.info(f"Auto-adjusting temperature to 1.0 for {model_name}")

Parameter Validation

Temperature: 0.0 - 2.0 (except nano models: fixed 1.0)
Max Tokens: 1 - 4096 (model-dependent)
Top P: 0.0 - 1.0

Google (Gemini) Configuration

Environment Variables

# Core settings
GOOGLE_API_KEY=AIzaSyxxxxx
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.7
GOOGLE_MAX_TOKENS=1000

# Advanced parameters (optional)
GOOGLE_TOP_P=0.95
GOOGLE_TOP_K=40

Temperature Clamping

# Auto-clamping to Google's range
google_temp = max(0.0, min(1.0, configured_temperature))
if google_temp != configured_temperature:
    logger.info(f"Clamping temperature from {configured_temperature} to {google_temp}")

Parameter Constraints

Temperature: 0.0 - 1.0 (strictly enforced)
Max Tokens: 1 - 8192
Top K: 1 - 40

Ollama Configuration

Environment Variables

# Core settings
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
OLLAMA_TEMPERATURE=0.7
OLLAMA_MAX_TOKENS=1000

# Connection settings
OLLAMA_TIMEOUT=30
OLLAMA_KEEP_ALIVE=5m

Service Management

# Start Ollama service
ollama serve &

# Verify service status
curl http://localhost:11434/api/version

# Model management
ollama pull llama3.1:8b
ollama list
ollama rm unused_model

Parameter Flexibility

Temperature: 0.0 - 2.0 (widest range)
Context Length: Model-dependent (2K - 128K)
Custom Parameters: Model-specific options available

HuggingFace Configuration

Environment Variables

# Core settings
HUGGINGFACE_API_KEY=hf_xxxxx
HUGGINGFACE_MODEL=microsoft/DialoGPT-medium
HUGGINGFACE_TEMPERATURE=0.7  # Often ignored
HUGGINGFACE_MAX_TOKENS=500

# API settings
HUGGINGFACE_WAIT_FOR_MODEL=true
HUGGINGFACE_USE_CACHE=true

API Limitations

# Note: Temperature is often ignored by Inference API
logger.warning(f"HuggingFace model {model_name} may ignore temperature setting")
return 0.7  # API typically uses this default

⚙️ Advanced Configuration

Dynamic Provider Switching

# config/settings.py implementation
def get_llm_config():
    provider = os.getenv("LLM_PROVIDER", "openai").lower()
    fallback = os.getenv("LLM_FALLBACK_PROVIDER", "google").lower()
    
    return {
        "provider": provider,
        "fallback_provider": fallback,
        **get_provider_config(provider)
    }

def get_provider_config(provider):
    """Get provider-specific configuration."""
    configs = {
        "openai": {
            "api_key": os.getenv("OPENAI_API_KEY"),
            "model": os.getenv("OPENAI_MODEL", "gpt-4o-mini"),
            "temperature": float(os.getenv("OPENAI_TEMPERATURE", "0.7")),
            "max_tokens": int(os.getenv("OPENAI_MAX_TOKENS", "1000")),
        },
        "google": {
            "api_key": os.getenv("GOOGLE_API_KEY"),
            "model": os.getenv("GOOGLE_MODEL", "gemini-2.5-flash"),
            "temperature": float(os.getenv("GOOGLE_TEMPERATURE", "0.7")),
            "max_tokens": int(os.getenv("GOOGLE_MAX_TOKENS", "1000")),
        },
        # ... other providers
    }
    return configs.get(provider, {})

Fallback Configuration

# Automatic fallback on provider failure
def get_llm_response(message):
    try:
        return primary_provider.chat_completion(message)
    except Exception as e:
        logger.warning(f"Primary provider failed: {e}")
        return fallback_provider.chat_completion(message)

Environment-Specific Configs

Development (.env.development)

# Fast, free/cheap for testing
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.8  # More creative for testing
LLM_FALLBACK_PROVIDER=ollama

Production (.env.production)

# Reliable, consistent for production
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-4o-mini
OPENAI_TEMPERATURE=0.7  # Consistent responses
LLM_FALLBACK_PROVIDER=google

Local Development (.env.local)

# Self-hosted for offline development
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
OLLAMA_TEMPERATURE=0.7
# No fallback - fully local

🚨 Configuration Troubleshooting

Issue: GPT-5-nano Temperature Error

Error: Temperature must be 1.0 for gpt-5-nano Status: ✅ Auto-fixed in services/llm_service.py Verification:

python -c "
import os
os.environ['OPENAI_MODEL'] = 'gpt-5-nano'
os.environ['OPENAI_TEMPERATURE'] = '0.5'
from services.llm_service import LLMService
LLMService()  # Should log temperature override
"

Issue: Google Temperature Out of Range

Error: Temperature must be between 0.0 and 1.0 Solution: Automatic clamping implemented Test:

python -c "
import os
os.environ['LLM_PROVIDER'] = 'google'
os.environ['GOOGLE_TEMPERATURE'] = '1.5'
from services.llm_service import LLMService
LLMService()  # Should clamp to 1.0
"

Issue: Ollama Connection Failed

Error: ConnectionError: Could not connect to Ollama Diagnosis:

# Check if Ollama is running
curl -f http://localhost:11434/api/version || echo "Ollama not running"

# Check if model exists
ollama list | grep "llama3.1:8b" || echo "Model not found"

# Check system resources
free -h  # RAM usage
df -h    # Disk space

Fix:

# Start Ollama service
ollama serve &

# Pull required model
ollama pull llama3.1:8b

# Test connection
curl -d '{"model":"llama3.1:8b","prompt":"test","stream":false}' \
     http://localhost:11434/api/generate

Issue: HuggingFace Temperature Ignored

Issue: Settings have no effect on response Explanation: This is expected behavior - HuggingFace Inference API typically ignores temperature Workaround: Use different models or providers for temperature control

Issue: Missing API Keys

Error: AuthenticationError: Invalid API key Diagnosis:

# Check environment variables
echo "OpenAI: ${OPENAI_API_KEY:0:10}..." 
echo "Google: ${GOOGLE_API_KEY:0:10}..."
echo "HuggingFace: ${HUGGINGFACE_API_KEY:0:10}..."

# Test API key validity
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
     https://api.openai.com/v1/models | jq '.data[0].id' || echo "Invalid OpenAI key"

🔍 Configuration Validation

Automated Configuration Check

# Run comprehensive configuration validation
python -c "
from config.settings import get_llm_config
from services.llm_service import LLMService
import json

print('🔧 Configuration Validation')
print('=' * 40)

# Load configuration
try:
    config = get_llm_config()
    print('✅ Configuration loaded successfully')
    print(f'Provider: {config.get(\"provider\")}')
    print(f'Model: {config.get(\"model\")}')
    print(f'Temperature: {config.get(\"temperature\")}')
except Exception as e:
    print(f'❌ Configuration error: {e}')
    exit(1)

# Test service initialization
try:
    service = LLMService()
    print('✅ LLM Service initialized')
except Exception as e:
    print(f'❌ Service initialization failed: {e}')
    exit(1)

# Test simple completion
try:
    response = service.simple_chat_completion('Test message')
    print('✅ Chat completion successful')
    print(f'Response length: {len(response)} characters')
except Exception as e:
    print(f'❌ Chat completion failed: {e}')
    exit(1)

print('🎉 All configuration checks passed!')
"

Provider-Specific Health Checks

# OpenAI health check
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
     https://api.openai.com/v1/models | jq '.data | length'

# Google health check  
curl "https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEY" | jq '.models | length'

# Ollama health check
curl http://localhost:11434/api/tags | jq '.models | length'

# HuggingFace health check
curl -H "Authorization: Bearer $HUGGINGFACE_API_KEY" \
     https://huggingface.co/api/whoami | jq '.name'

Configuration Diff Tool

# Compare current config with defaults
python -c "
import os
from config.settings import get_llm_config

defaults = {
    'openai': {'temperature': 0.7, 'max_tokens': 1000},
    'google': {'temperature': 0.7, 'max_tokens': 1000},
    'ollama': {'temperature': 0.7, 'max_tokens': 1000},
}

current = get_llm_config()
provider = current.get('provider')
default = defaults.get(provider, {})

print(f'Configuration for {provider}:')
for key, default_val in default.items():
    current_val = current.get(key)
    status = '✅' if current_val == default_val else '⚠️'
    print(f'{status} {key}: {current_val} (default: {default_val})')
"

📋 Configuration Templates

Minimal Setup (Single Provider)

# .env.minimal
LLM_PROVIDER=google
GOOGLE_API_KEY=your_api_key
GOOGLE_MODEL=gemini-2.5-flash

Robust Setup (Primary + Fallback)

# .env.robust  
LLM_PROVIDER=openai
OPENAI_API_KEY=your_primary_key
OPENAI_MODEL=gpt-4o-mini
LLM_FALLBACK_PROVIDER=google
GOOGLE_API_KEY=your_fallback_key
GOOGLE_MODEL=gemini-2.5-flash

Local-First Setup

# .env.local-first
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
LLM_FALLBACK_PROVIDER=google
GOOGLE_API_KEY=your_cloud_backup_key

Budget-Conscious Setup

# .env.budget
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano
OPENAI_TEMPERATURE=1.0  # Fixed for nano
OPENAI_MAX_TOKENS=500   # Reduce costs

🔐 Security Best Practices

API Key Management

# Use environment variables
export OPENAI_API_KEY="sk-..."

# Never commit keys to git
echo "*.env*" >> .gitignore
echo ".env" >> .gitignore

# Use different keys for different environments
cp .env.example .env.development
cp .env.example .env.production

Rate Limiting Configuration

# Add to config/settings.py
RATE_LIMITS = {
    "openai": {"rpm": 500, "tpm": 40000},
    "google": {"rpm": 60, "tpm": 32000},
    "ollama": {"rpm": None, "tpm": None},  # Local = unlimited
}

Error Handling Strategy

# Graceful degradation configuration
FALLBACK_CHAIN = [
    "primary_provider",
    "fallback_provider", 
    "local_provider",
    "cached_response"
]

🧪 Testing Configuration Changes

Unit Tests for Configuration

# Test temperature overrides
python -m pytest tests/test_llm_temperature.py -v

# Test provider fallbacks
python -m pytest tests/test_llm_fallback.py -v

# Test API key validation
python -m pytest tests/test_api_keys.py -v

Integration Tests

# Test each provider individually
python -c "
import os
providers = ['openai', 'google', 'ollama']

for provider in providers:
    os.environ['LLM_PROVIDER'] = provider
    try:
        from services.llm_service import LLMService
        service = LLMService()
        response = service.simple_chat_completion('Test')
        print(f'✅ {provider}: {len(response)} chars')
    except Exception as e:
        print(f'❌ {provider}: {e}')
"

Performance Benchmarks

# Measure response times
python -c "
import time
from services.llm_service import LLMService

service = LLMService()
start = time.time()
response = service.simple_chat_completion('Quick recipe suggestion')
elapsed = time.time() - start

print(f'Response time: {elapsed:.2f}s')
print(f'Response length: {len(response)} characters')
print(f'Words per second: {len(response.split()) / elapsed:.1f}')
"

🔄 Configuration Migration

Upgrading from Old Configuration

# Migrate old environment variables
# Old format → New format
mv .env .env.backup

# Update variable names
sed 's/LLM_MODEL=/OPENAI_MODEL=/' .env.backup > .env
sed -i 's/LLM_TEMPERATURE=/OPENAI_TEMPERATURE=/' .env
sed -i 's/LLM_MAX_TOKENS=/OPENAI_MAX_TOKENS=/' .env

echo "LLM_PROVIDER=openai" >> .env

Version Compatibility Check

# Check if configuration is compatible
def check_config_version():
    required_vars = ["LLM_PROVIDER"]
    legacy_vars = ["LLM_MODEL", "LLM_TEMPERATURE"]
    
    has_new = all(os.getenv(var) for var in required_vars)
    has_legacy = any(os.getenv(var) for var in legacy_vars)
    
    if has_legacy and not has_new:
        raise ValueError("Legacy configuration detected. Please migrate to new format.")
    
    return has_new

💡 Next Steps: After configuring your providers, see the Model Selection Guide for choosing the best models for your use case.