plg4-dev-server / backend /docs /model-selection-guide.md
Jesse Johnson
New commit for backend deployment: 2025-09-25_13-24-03
c59d808

Model Selection Guide

🎯 At-a-Glance Recommendations

Priority Best Choice Provider Monthly Cost* Setup Time Quality Score Why Choose This
Ease of Use Gemini 2.5 Flash Google Free - $2 2 min 90% Excellent free tier
Best Value GPT-5-nano OpenAI $1.00 2 min 88% Modern GPT-5 at nano price
Premium Quality Claude 3 Opus Anthropic $225 2 min 95% Highest reasoning quality
Self-Hosted Llama 3.1:8b Ollama Free 10 min 82% Perfect balance
High-End Local DeepSeek-R1:7b Ollama Free 15 min 88% Best reasoning model
Budget Cloud Claude 3.5 Haiku Anthropic $4 2 min 87% Fast and affordable
Alternative Local CodeQwen1.5:7b Ollama Free 10 min 85% Excellent for structured data

*Based on 30,000 queries/month


🏒 Cloud Models (Closed Source)

OpenAI Models

GPT-5 (Latest Flagship) ⭐ NEW

OPENAI_MODEL=gpt-5
  • Pricing: $20/month (Plus plan) - Unlimited with guardrails
  • Capabilities: Advanced reasoning, thinking, code execution
  • Best For: Premium applications requiring cutting-edge AI
  • Recipe Quality: Outstanding (96%) - Best culinary understanding
  • Context: 196K tokens (reasoning mode)

GPT-5-nano (Ultra Budget) ⭐ MISSED GEM

OPENAI_MODEL=gpt-5-nano
  • Pricing: $0.05/1M input, $0.40/1M output tokens
  • Monthly Cost: ~$1.00 for 30K queries
  • Best For: Budget-conscious deployments with modern capabilities
  • Recipe Quality: Very Good (88%)
  • Speed: Very Fast
  • Features: GPT-5 architecture at nano pricing

GPT-4o-mini (Proven Budget Choice)

OPENAI_MODEL=gpt-4o-mini
  • Pricing: $0.15/1M input, $0.60/1M output tokens
  • Monthly Cost: ~$4 for 30K queries
  • Best For: Cost-effective production deployments
  • Recipe Quality: Very Good (86%)
  • Speed: Very Fast

Google AI (Gemini) Models

Gemini 2.5 Flash ⭐ RECOMMENDED

GOOGLE_MODEL=gemini-2.5-flash
  • Pricing: Free tier, then $0.30/1M input, $2.50/1M output
  • Monthly Cost: Free - $2 for most usage patterns
  • Best For: Development and cost-conscious production
  • Recipe Quality: Excellent (90%)
  • Features: Thinking budgets, 1M context window

Gemini 2.5 Pro (High-End)

GOOGLE_MODEL=gemini-2.5-pro
  • Pricing: $1.25/1M input, $10/1M output (≀200K context)
  • Monthly Cost: ~$25 for 30K queries
  • Best For: Premium applications requiring best Google AI
  • Recipe Quality: Excellent (92%)

Gemini 2.0 Flash-Lite (Ultra Budget)

GOOGLE_MODEL=gemini-2.0-flash-lite
  • Pricing: $0.075/1M input, $0.30/1M output
  • Monthly Cost: ~$0.90 for 30K queries
  • Best For: High-volume, cost-sensitive applications
  • Recipe Quality: Good (85%)

πŸ”“ Open Source Models (Self-Hosted)

Ollama Models (Latest Releases)

DeepSeek-R1:7b ⭐ BREAKTHROUGH MODEL

OLLAMA_MODEL=deepseek-r1:7b
  • Parameters: 7B
  • Download: ~4.7GB
  • RAM Required: 8GB
  • Best For: Advanced reasoning tasks, O1-level performance
  • Recipe Quality: Outstanding (88%)
  • Special: Chain-of-thought reasoning, approaching GPT-4 performance

Gemma 3:27b ⭐ NEW FLAGSHIP

OLLAMA_MODEL=gemma3:27b
  • Parameters: 27B
  • Download: ~17GB
  • RAM Required: 32GB
  • Best For: Highest quality open source experience
  • Recipe Quality: Outstanding (89%)
  • Features: Vision capabilities, state-of-the-art performance

Llama 3.1:8b (Proven Choice)

OLLAMA_MODEL=llama3.1:8b
  • Parameters: 8B
  • Download: ~4.7GB
  • RAM Required: 8GB
  • Best For: Balanced production deployment
  • Recipe Quality: Very Good (82%)
  • Status: Your current choice - excellent balance!

Qwen 3:8b ⭐ NEW RELEASE

OLLAMA_MODEL=qwen3:8b
  • Parameters: 8B
  • Download: ~4.4GB
  • RAM Required: 8GB
  • Best For: Multilingual support, latest technology
  • Recipe Quality: Very Good (84%)
  • Features: Tool use, thinking capabilities

Phi 4:14b ⭐ MICROSOFT'S LATEST

OLLAMA_MODEL=phi4:14b
  • Parameters: 14B
  • Download: ~9.1GB
  • RAM Required: 16GB
  • Best For: Reasoning and math tasks
  • Recipe Quality: Very Good (85%)
  • Features: State-of-the-art efficiency

Gemma 3:4b (Efficient Choice)

OLLAMA_MODEL=gemma3:4b
  • Parameters: 4B
  • Download: ~3.3GB
  • RAM Required: 6GB
  • Best For: Resource-constrained deployments
  • Recipe Quality: Good (78%)
  • Features: Excellent for size, runs on modest hardware

HuggingFace Models (Downloadable for Local Use)

CodeQwen1.5:7b ⭐ ALIBABA'S CODE MODEL

OLLAMA_MODEL=codeqwen:7b
  • Parameters: 7B
  • Download: ~4.2GB
  • RAM Required: 8GB
  • Best For: Recipe parsing, ingredient analysis, structured data
  • Recipe Quality: Very Good (85%)
  • Features: Excellent at understanding structured recipe formats

Mistral-Nemo:12b ⭐ BALANCED CHOICE

OLLAMA_MODEL=mistral-nemo:12b
  • Parameters: 12B
  • Download: ~7GB
  • RAM Required: 12GB
  • Best For: General conversation with good reasoning
  • Recipe Quality: Very Good (84%)
  • Features: Multilingual, efficient, well-balanced

Nous-Hermes2:10.7b ⭐ FINE-TUNED EXCELLENCE

OLLAMA_MODEL=nous-hermes2:10.7b
  • Parameters: 10.7B
  • Download: ~6.4GB
  • RAM Required: 12GB
  • Best For: Instruction following, detailed responses
  • Recipe Quality: Very Good (83%)
  • Features: Excellent instruction following, helpful responses

OpenHermes2.5-Mistral:7b ⭐ COMMUNITY FAVORITE

OLLAMA_MODEL=openhermes2.5-mistral:7b
  • Parameters: 7B
  • Download: ~4.1GB
  • RAM Required: 8GB
  • Best For: Creative recipe suggestions, conversational AI
  • Recipe Quality: Good (81%)
  • Features: Creative, conversational, reliable

Solar:10.7b ⭐ UPSTAGE'S MODEL

OLLAMA_MODEL=solar:10.7b
  • Parameters: 10.7B
  • Download: ~6.1GB
  • RAM Required: 12GB
  • Best For: Analytical tasks, recipe modifications
  • Recipe Quality: Very Good (83%)
  • Features: Strong analytical capabilities, detailed explanations

Anthropic Claude Models

Claude 3.5 Sonnet (Production Standard)

ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
  • Pricing: $3/1M input, $15/1M output tokens
  • Monthly Cost: ~$45 for 30K queries
  • Best For: Balanced performance and reasoning
  • Recipe Quality: Outstanding (94%)
  • Features: Advanced analysis, code understanding

Claude 3.5 Haiku (Speed Focused)

ANTHROPIC_MODEL=claude-3-5-haiku-20241022
  • Pricing: $0.25/1M input, $1.25/1M output tokens
  • Monthly Cost: ~$4 for 30K queries
  • Best For: Fast, cost-effective responses
  • Recipe Quality: Very Good (87%)
  • Features: Lightning fast, good quality

Claude 3 Opus (Premium Reasoning)

ANTHROPIC_MODEL=claude-3-opus-20240229
  • Pricing: $15/1M input, $75/1M output tokens
  • Monthly Cost: ~$225 for 30K queries
  • Best For: Complex reasoning, highest quality
  • Recipe Quality: Outstanding (95%)
  • Features: Top-tier reasoning, complex tasks

🎯 Scenario-Based Recommendations

πŸ‘¨β€πŸ’» Development & Testing

Choice: Gemini 2.5 Flash

LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
  • Free tier covers most development
  • Excellent quality for testing
  • Easy setup and integration

πŸš€ Small to Medium Production

Choice: Gemini 2.5 Flash or GPT-4o-mini

# Cost-focused
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash

# Quality-focused
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-4o-mini

🏠 Self-Hosted

Choice: Llama 3.1:8b or upgrade to DeepSeek-R1:7b

# Your current (excellent choice)
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b

# Upgrade option (better reasoning)
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b

πŸ’° Budget/Free

Choice: Local models or GPT-5-nano

# Best local alternative
LLM_PROVIDER=ollama
OLLAMA_MODEL=codeqwen:7b

# Best budget paid option
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano

# Quality budget cloud
LLM_PROVIDER=anthropic
ANTHROPIC_MODEL=claude-3-5-haiku-20241022

πŸ”’ Privacy/Offline

Choice: DeepSeek-R1:7b or Gemma 3:4b

# Best reasoning
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b

# Resource-efficient
LLM_PROVIDER=ollama
OLLAMA_MODEL=gemma3:4b

⚑ Quick Setup Commands

Cloud Models (Instant Setup)

Gemini 2.5 Flash (Recommended)

# Update .env
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.7
GOOGLE_MAX_TOKENS=1000

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… Gemini 2.5 Flash ready!')
response = service.simple_chat_completion('Suggest a quick pasta recipe')
print(f'Response: {response[:100]}...')
"

CodeQwen1.5:7b (Structured Data Expert)

# Pull model
ollama pull codeqwen:7b

# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=codeqwen:7b
OLLAMA_TEMPERATURE=0.7

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… CodeQwen 1.5:7b ready!')
response = service.simple_chat_completion('Parse this recipe: 2 cups flour, 1 egg, 1 cup milk')
print(f'Response: {response[:100]}...')
"

Mistral-Nemo:12b (Balanced Performance)

# Pull model
ollama pull mistral-nemo:12b

# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=mistral-nemo:12b
OLLAMA_TEMPERATURE=0.7

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… Mistral-Nemo ready!')
response = service.simple_chat_completion('Suggest a Mediterranean dinner menu')
print(f'Response: {response[:100]}...')
"

Claude 3.5 Haiku (Speed + Quality)

# Update .env
LLM_PROVIDER=anthropic
ANTHROPIC_MODEL=claude-3-5-haiku-20241022
ANTHROPIC_TEMPERATURE=0.7
ANTHROPIC_MAX_TOKENS=1000

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… Claude 3.5 Haiku ready!')
response = service.simple_chat_completion('Quick dinner ideas with vegetables')
print(f'Response: {response[:100]}...')
"

GPT-5-nano (Budget Winner)

# Update .env
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… GPT-5-nano ready!')
response = service.simple_chat_completion('Quick healthy breakfast ideas')
print(f'Response: {response[:100]}...')
"

GPT-5 (Premium)

# Update .env
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… GPT-5 ready!')
response = service.simple_chat_completion('Create a healthy meal plan')
print(f'Response: {response[:100]}...')
"

Self-Hosted Models

DeepSeek-R1:7b (Latest Breakthrough)

# Pull model
ollama pull deepseek-r1:7b

# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b
OLLAMA_TEMPERATURE=0.7

# Start Ollama
ollama serve &

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… DeepSeek-R1 ready!')
response = service.simple_chat_completion('Explain the science behind sourdough fermentation')
print(f'Response: {response[:100]}...')
"

Gemma 3:4b (Efficient)

# Pull model
ollama pull gemma3:4b

# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=gemma3:4b
OLLAMA_TEMPERATURE=0.7

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('βœ… Gemma 3:4b ready!')
response = service.simple_chat_completion('Quick chicken recipes for weeknight dinners')
print(f'Response: {response[:100]}...')
"

πŸ”§ Hardware Requirements

Cloud Models

  • Requirements: Internet connection, API key
  • RAM: Any (processing done remotely)
  • Storage: Minimal
  • Best For: Instant setup, no hardware constraints

Self-Hosted Requirements

Model Parameters RAM Needed Storage GPU Beneficial Best For
gemma3:4b 4B 6GB 3.3GB Optional Laptops, modest hardware
codeqwen:7b 7B 8GB 4.2GB Yes Structured data, parsing
llama3.1:8b 8B 8GB 4.7GB Yes Standard workstations
deepseek-r1:7b 7B 8GB 4.7GB Yes Reasoning tasks
openhermes2.5-mistral:7b 7B 8GB 4.1GB Yes Conversational AI
nous-hermes2:10.7b 10.7B 12GB 6.4GB Recommended Instruction following
mistral-nemo:12b 12B 12GB 7GB Recommended Balanced performance
phi4:14b 14B 16GB 9.1GB Recommended High-end workstations
gemma3:27b 27B 32GB 17GB Required Powerful servers