plg4-dev-server / backend /docs /model-selection-guide.md
Jesse Johnson
New commit for backend deployment: 2025-09-25_13-24-03
c59d808
# Model Selection Guide
## 🎯 At-a-Glance Recommendations
| Priority | Best Choice | Provider | Monthly Cost* | Setup Time | Quality Score | Why Choose This |
|----------|-------------|----------|---------------|------------|---------------|-----------------|
| **Ease of Use** | Gemini 2.5 Flash | Google | Free - $2 | 2 min | 90% | Excellent free tier |
| **Best Value** | GPT-5-nano | OpenAI | $1.00 | 2 min | 88% | Modern GPT-5 at nano price |
| **Premium Quality** | Claude 3 Opus | Anthropic | $225 | 2 min | 95% | Highest reasoning quality |
| **Self-Hosted** | Llama 3.1:8b | Ollama | Free | 10 min | 82% | Perfect balance |
| **High-End Local** | DeepSeek-R1:7b | Ollama | Free | 15 min | 88% | Best reasoning model |
| **Budget Cloud** | Claude 3.5 Haiku | Anthropic | $4 | 2 min | 87% | Fast and affordable |
| **Alternative Local** | CodeQwen1.5:7b | Ollama | Free | 10 min | 85% | Excellent for structured data |
*Based on 30,000 queries/month
---
## 🏒 Cloud Models (Closed Source)
### OpenAI Models
#### GPT-5 (Latest Flagship) ⭐ **NEW**
```bash
OPENAI_MODEL=gpt-5
```
- **Pricing**: $20/month (Plus plan) - Unlimited with guardrails
- **Capabilities**: Advanced reasoning, thinking, code execution
- **Best For**: Premium applications requiring cutting-edge AI
- **Recipe Quality**: Outstanding (96%) - Best culinary understanding
- **Context**: 196K tokens (reasoning mode)
#### GPT-5-nano (Ultra Budget) ⭐ **MISSED GEM**
```bash
OPENAI_MODEL=gpt-5-nano
```
- **Pricing**: $0.05/1M input, $0.40/1M output tokens
- **Monthly Cost**: ~$1.00 for 30K queries
- **Best For**: Budget-conscious deployments with modern capabilities
- **Recipe Quality**: Very Good (88%)
- **Speed**: Very Fast
- **Features**: GPT-5 architecture at nano pricing
#### GPT-4o-mini (Proven Budget Choice)
```bash
OPENAI_MODEL=gpt-4o-mini
```
- **Pricing**: $0.15/1M input, $0.60/1M output tokens
- **Monthly Cost**: ~$4 for 30K queries
- **Best For**: Cost-effective production deployments
- **Recipe Quality**: Very Good (86%)
- **Speed**: Very Fast
### Google AI (Gemini) Models
#### Gemini 2.5 Flash ⭐ **RECOMMENDED**
```bash
GOOGLE_MODEL=gemini-2.5-flash
```
- **Pricing**: Free tier, then $0.30/1M input, $2.50/1M output
- **Monthly Cost**: Free - $2 for most usage patterns
- **Best For**: Development and cost-conscious production
- **Recipe Quality**: Excellent (90%)
- **Features**: Thinking budgets, 1M context window
#### Gemini 2.5 Pro (High-End)
```bash
GOOGLE_MODEL=gemini-2.5-pro
```
- **Pricing**: $1.25/1M input, $10/1M output (≀200K context)
- **Monthly Cost**: ~$25 for 30K queries
- **Best For**: Premium applications requiring best Google AI
- **Recipe Quality**: Excellent (92%)
#### Gemini 2.0 Flash-Lite (Ultra Budget)
```bash
GOOGLE_MODEL=gemini-2.0-flash-lite
```
- **Pricing**: $0.075/1M input, $0.30/1M output
- **Monthly Cost**: ~$0.90 for 30K queries
- **Best For**: High-volume, cost-sensitive applications
- **Recipe Quality**: Good (85%)
## πŸ”“ Open Source Models (Self-Hosted)
### Ollama Models (Latest Releases)
#### DeepSeek-R1:7b ⭐ **BREAKTHROUGH MODEL**
```bash
OLLAMA_MODEL=deepseek-r1:7b
```
- **Parameters**: 7B
- **Download**: ~4.7GB
- **RAM Required**: 8GB
- **Best For**: Advanced reasoning tasks, O1-level performance
- **Recipe Quality**: Outstanding (88%)
- **Special**: Chain-of-thought reasoning, approaching GPT-4 performance
#### Gemma 3:27b ⭐ **NEW FLAGSHIP**
```bash
OLLAMA_MODEL=gemma3:27b
```
- **Parameters**: 27B
- **Download**: ~17GB
- **RAM Required**: 32GB
- **Best For**: Highest quality open source experience
- **Recipe Quality**: Outstanding (89%)
- **Features**: Vision capabilities, state-of-the-art performance
#### Llama 3.1:8b (Proven Choice)
```bash
OLLAMA_MODEL=llama3.1:8b
```
- **Parameters**: 8B
- **Download**: ~4.7GB
- **RAM Required**: 8GB
- **Best For**: Balanced production deployment
- **Recipe Quality**: Very Good (82%)
- **Status**: Your current choice - excellent balance!
#### Qwen 3:8b ⭐ **NEW RELEASE**
```bash
OLLAMA_MODEL=qwen3:8b
```
- **Parameters**: 8B
- **Download**: ~4.4GB
- **RAM Required**: 8GB
- **Best For**: Multilingual support, latest technology
- **Recipe Quality**: Very Good (84%)
- **Features**: Tool use, thinking capabilities
#### Phi 4:14b ⭐ **MICROSOFT'S LATEST**
```bash
OLLAMA_MODEL=phi4:14b
```
- **Parameters**: 14B
- **Download**: ~9.1GB
- **RAM Required**: 16GB
- **Best For**: Reasoning and math tasks
- **Recipe Quality**: Very Good (85%)
- **Features**: State-of-the-art efficiency
#### Gemma 3:4b (Efficient Choice)
```bash
OLLAMA_MODEL=gemma3:4b
```
- **Parameters**: 4B
- **Download**: ~3.3GB
- **RAM Required**: 6GB
- **Best For**: Resource-constrained deployments
- **Recipe Quality**: Good (78%)
- **Features**: Excellent for size, runs on modest hardware
### HuggingFace Models (Downloadable for Local Use)
#### CodeQwen1.5:7b ⭐ **ALIBABA'S CODE MODEL**
```bash
OLLAMA_MODEL=codeqwen:7b
```
- **Parameters**: 7B
- **Download**: ~4.2GB
- **RAM Required**: 8GB
- **Best For**: Recipe parsing, ingredient analysis, structured data
- **Recipe Quality**: Very Good (85%)
- **Features**: Excellent at understanding structured recipe formats
#### Mistral-Nemo:12b ⭐ **BALANCED CHOICE**
```bash
OLLAMA_MODEL=mistral-nemo:12b
```
- **Parameters**: 12B
- **Download**: ~7GB
- **RAM Required**: 12GB
- **Best For**: General conversation with good reasoning
- **Recipe Quality**: Very Good (84%)
- **Features**: Multilingual, efficient, well-balanced
#### Nous-Hermes2:10.7b ⭐ **FINE-TUNED EXCELLENCE**
```bash
OLLAMA_MODEL=nous-hermes2:10.7b
```
- **Parameters**: 10.7B
- **Download**: ~6.4GB
- **RAM Required**: 12GB
- **Best For**: Instruction following, detailed responses
- **Recipe Quality**: Very Good (83%)
- **Features**: Excellent instruction following, helpful responses
#### OpenHermes2.5-Mistral:7b ⭐ **COMMUNITY FAVORITE**
```bash
OLLAMA_MODEL=openhermes2.5-mistral:7b
```
- **Parameters**: 7B
- **Download**: ~4.1GB
- **RAM Required**: 8GB
- **Best For**: Creative recipe suggestions, conversational AI
- **Recipe Quality**: Good (81%)
- **Features**: Creative, conversational, reliable
#### Solar:10.7b ⭐ **UPSTAGE'S MODEL**
```bash
OLLAMA_MODEL=solar:10.7b
```
- **Parameters**: 10.7B
- **Download**: ~6.1GB
- **RAM Required**: 12GB
- **Best For**: Analytical tasks, recipe modifications
- **Recipe Quality**: Very Good (83%)
- **Features**: Strong analytical capabilities, detailed explanations
### Anthropic Claude Models
#### Claude 3.5 Sonnet (Production Standard)
```bash
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
```
- **Pricing**: $3/1M input, $15/1M output tokens
- **Monthly Cost**: ~$45 for 30K queries
- **Best For**: Balanced performance and reasoning
- **Recipe Quality**: Outstanding (94%)
- **Features**: Advanced analysis, code understanding
#### Claude 3.5 Haiku (Speed Focused)
```bash
ANTHROPIC_MODEL=claude-3-5-haiku-20241022
```
- **Pricing**: $0.25/1M input, $1.25/1M output tokens
- **Monthly Cost**: ~$4 for 30K queries
- **Best For**: Fast, cost-effective responses
- **Recipe Quality**: Very Good (87%)
- **Features**: Lightning fast, good quality
#### Claude 3 Opus (Premium Reasoning)
```bash
ANTHROPIC_MODEL=claude-3-opus-20240229
```
- **Pricing**: $15/1M input, $75/1M output tokens
- **Monthly Cost**: ~$225 for 30K queries
- **Best For**: Complex reasoning, highest quality
- **Recipe Quality**: Outstanding (95%)
- **Features**: Top-tier reasoning, complex tasks
---
## 🎯 Scenario-Based Recommendations
### πŸ‘¨β€πŸ’» **Development & Testing**
**Choice**: Gemini 2.5 Flash
```bash
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
```
- Free tier covers most development
- Excellent quality for testing
- Easy setup and integration
### πŸš€ **Small to Medium Production**
**Choice**: Gemini 2.5 Flash or GPT-4o-mini
```bash
# Cost-focused
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
# Quality-focused
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-4o-mini
```
### 🏠 **Self-Hosted**
**Choice**: Llama 3.1:8b or upgrade to DeepSeek-R1:7b
```bash
# Your current (excellent choice)
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
# Upgrade option (better reasoning)
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b
```
### πŸ’° **Budget/Free**
**Choice**: Local models or GPT-5-nano
```bash
# Best local alternative
LLM_PROVIDER=ollama
OLLAMA_MODEL=codeqwen:7b
# Best budget paid option
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano
# Quality budget cloud
LLM_PROVIDER=anthropic
ANTHROPIC_MODEL=claude-3-5-haiku-20241022
```
### πŸ”’ **Privacy/Offline**
**Choice**: DeepSeek-R1:7b or Gemma 3:4b
```bash
# Best reasoning
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b
# Resource-efficient
LLM_PROVIDER=ollama
OLLAMA_MODEL=gemma3:4b
```
---
## ⚑ Quick Setup Commands
### Cloud Models (Instant Setup)
#### Gemini 2.5 Flash (Recommended)
```bash
# Update .env
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.7
GOOGLE_MAX_TOKENS=1000
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… Gemini 2.5 Flash ready!')
response = service.simple_chat_completion('Suggest a quick pasta recipe')
print(f'Response: {response[:100]}...')
"
```
#### CodeQwen1.5:7b (Structured Data Expert)
```bash
# Pull model
ollama pull codeqwen:7b
# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=codeqwen:7b
OLLAMA_TEMPERATURE=0.7
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… CodeQwen 1.5:7b ready!')
response = service.simple_chat_completion('Parse this recipe: 2 cups flour, 1 egg, 1 cup milk')
print(f'Response: {response[:100]}...')
"
```
#### Mistral-Nemo:12b (Balanced Performance)
```bash
# Pull model
ollama pull mistral-nemo:12b
# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=mistral-nemo:12b
OLLAMA_TEMPERATURE=0.7
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… Mistral-Nemo ready!')
response = service.simple_chat_completion('Suggest a Mediterranean dinner menu')
print(f'Response: {response[:100]}...')
"
```
#### Claude 3.5 Haiku (Speed + Quality)
```bash
# Update .env
LLM_PROVIDER=anthropic
ANTHROPIC_MODEL=claude-3-5-haiku-20241022
ANTHROPIC_TEMPERATURE=0.7
ANTHROPIC_MAX_TOKENS=1000
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… Claude 3.5 Haiku ready!')
response = service.simple_chat_completion('Quick dinner ideas with vegetables')
print(f'Response: {response[:100]}...')
"
```
#### GPT-5-nano (Budget Winner)
```bash
# Update .env
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… GPT-5-nano ready!')
response = service.simple_chat_completion('Quick healthy breakfast ideas')
print(f'Response: {response[:100]}...')
"
```
#### GPT-5 (Premium)
```bash
# Update .env
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… GPT-5 ready!')
response = service.simple_chat_completion('Create a healthy meal plan')
print(f'Response: {response[:100]}...')
"
```
### Self-Hosted Models
#### DeepSeek-R1:7b (Latest Breakthrough)
```bash
# Pull model
ollama pull deepseek-r1:7b
# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b
OLLAMA_TEMPERATURE=0.7
# Start Ollama
ollama serve &
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… DeepSeek-R1 ready!')
response = service.simple_chat_completion('Explain the science behind sourdough fermentation')
print(f'Response: {response[:100]}...')
"
```
#### Gemma 3:4b (Efficient)
```bash
# Pull model
ollama pull gemma3:4b
# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=gemma3:4b
OLLAMA_TEMPERATURE=0.7
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… Gemma 3:4b ready!')
response = service.simple_chat_completion('Quick chicken recipes for weeknight dinners')
print(f'Response: {response[:100]}...')
"
```
---
## πŸ”§ Hardware Requirements
### Cloud Models
- **Requirements**: Internet connection, API key
- **RAM**: Any (processing done remotely)
- **Storage**: Minimal
- **Best For**: Instant setup, no hardware constraints
### Self-Hosted Requirements
| Model | Parameters | RAM Needed | Storage | GPU Beneficial | Best For |
|-------|------------|------------|---------|----------------|----------|
| `gemma3:4b` | 4B | 6GB | 3.3GB | Optional | Laptops, modest hardware |
| `codeqwen:7b` | 7B | 8GB | 4.2GB | Yes | Structured data, parsing |
| `llama3.1:8b` | 8B | 8GB | 4.7GB | Yes | Standard workstations |
| `deepseek-r1:7b` | 7B | 8GB | 4.7GB | Yes | Reasoning tasks |
| `openhermes2.5-mistral:7b` | 7B | 8GB | 4.1GB | Yes | Conversational AI |
| `nous-hermes2:10.7b` | 10.7B | 12GB | 6.4GB | Recommended | Instruction following |
| `mistral-nemo:12b` | 12B | 12GB | 7GB | Recommended | Balanced performance |
| `phi4:14b` | 14B | 16GB | 9.1GB | Recommended | High-end workstations |
| `gemma3:27b` | 27B | 32GB | 17GB | Required | Powerful servers |
---