Model Selection Guide
π― At-a-Glance Recommendations
| Priority | Best Choice | Provider | Monthly Cost* | Setup Time | Quality Score | Why Choose This |
|---|---|---|---|---|---|---|
| Ease of Use | Gemini 2.5 Flash | Free - $2 | 2 min | 90% | Excellent free tier | |
| Best Value | GPT-5-nano | OpenAI | $1.00 | 2 min | 88% | Modern GPT-5 at nano price |
| Premium Quality | Claude 3 Opus | Anthropic | $225 | 2 min | 95% | Highest reasoning quality |
| Self-Hosted | Llama 3.1:8b | Ollama | Free | 10 min | 82% | Perfect balance |
| High-End Local | DeepSeek-R1:7b | Ollama | Free | 15 min | 88% | Best reasoning model |
| Budget Cloud | Claude 3.5 Haiku | Anthropic | $4 | 2 min | 87% | Fast and affordable |
| Alternative Local | CodeQwen1.5:7b | Ollama | Free | 10 min | 85% | Excellent for structured data |
*Based on 30,000 queries/month
π’ Cloud Models (Closed Source)
OpenAI Models
GPT-5 (Latest Flagship) β NEW
OPENAI_MODEL=gpt-5
- Pricing: $20/month (Plus plan) - Unlimited with guardrails
- Capabilities: Advanced reasoning, thinking, code execution
- Best For: Premium applications requiring cutting-edge AI
- Recipe Quality: Outstanding (96%) - Best culinary understanding
- Context: 196K tokens (reasoning mode)
GPT-5-nano (Ultra Budget) β MISSED GEM
OPENAI_MODEL=gpt-5-nano
- Pricing: $0.05/1M input, $0.40/1M output tokens
- Monthly Cost: ~$1.00 for 30K queries
- Best For: Budget-conscious deployments with modern capabilities
- Recipe Quality: Very Good (88%)
- Speed: Very Fast
- Features: GPT-5 architecture at nano pricing
GPT-4o-mini (Proven Budget Choice)
OPENAI_MODEL=gpt-4o-mini
- Pricing: $0.15/1M input, $0.60/1M output tokens
- Monthly Cost: ~$4 for 30K queries
- Best For: Cost-effective production deployments
- Recipe Quality: Very Good (86%)
- Speed: Very Fast
Google AI (Gemini) Models
Gemini 2.5 Flash β RECOMMENDED
GOOGLE_MODEL=gemini-2.5-flash
- Pricing: Free tier, then $0.30/1M input, $2.50/1M output
- Monthly Cost: Free - $2 for most usage patterns
- Best For: Development and cost-conscious production
- Recipe Quality: Excellent (90%)
- Features: Thinking budgets, 1M context window
Gemini 2.5 Pro (High-End)
GOOGLE_MODEL=gemini-2.5-pro
- Pricing: $1.25/1M input, $10/1M output (β€200K context)
- Monthly Cost: ~$25 for 30K queries
- Best For: Premium applications requiring best Google AI
- Recipe Quality: Excellent (92%)
Gemini 2.0 Flash-Lite (Ultra Budget)
GOOGLE_MODEL=gemini-2.0-flash-lite
- Pricing: $0.075/1M input, $0.30/1M output
- Monthly Cost: ~$0.90 for 30K queries
- Best For: High-volume, cost-sensitive applications
- Recipe Quality: Good (85%)
π Open Source Models (Self-Hosted)
Ollama Models (Latest Releases)
DeepSeek-R1:7b β BREAKTHROUGH MODEL
OLLAMA_MODEL=deepseek-r1:7b
- Parameters: 7B
- Download: ~4.7GB
- RAM Required: 8GB
- Best For: Advanced reasoning tasks, O1-level performance
- Recipe Quality: Outstanding (88%)
- Special: Chain-of-thought reasoning, approaching GPT-4 performance
Gemma 3:27b β NEW FLAGSHIP
OLLAMA_MODEL=gemma3:27b
- Parameters: 27B
- Download: ~17GB
- RAM Required: 32GB
- Best For: Highest quality open source experience
- Recipe Quality: Outstanding (89%)
- Features: Vision capabilities, state-of-the-art performance
Llama 3.1:8b (Proven Choice)
OLLAMA_MODEL=llama3.1:8b
- Parameters: 8B
- Download: ~4.7GB
- RAM Required: 8GB
- Best For: Balanced production deployment
- Recipe Quality: Very Good (82%)
- Status: Your current choice - excellent balance!
Qwen 3:8b β NEW RELEASE
OLLAMA_MODEL=qwen3:8b
- Parameters: 8B
- Download: ~4.4GB
- RAM Required: 8GB
- Best For: Multilingual support, latest technology
- Recipe Quality: Very Good (84%)
- Features: Tool use, thinking capabilities
Phi 4:14b β MICROSOFT'S LATEST
OLLAMA_MODEL=phi4:14b
- Parameters: 14B
- Download: ~9.1GB
- RAM Required: 16GB
- Best For: Reasoning and math tasks
- Recipe Quality: Very Good (85%)
- Features: State-of-the-art efficiency
Gemma 3:4b (Efficient Choice)
OLLAMA_MODEL=gemma3:4b
- Parameters: 4B
- Download: ~3.3GB
- RAM Required: 6GB
- Best For: Resource-constrained deployments
- Recipe Quality: Good (78%)
- Features: Excellent for size, runs on modest hardware
HuggingFace Models (Downloadable for Local Use)
CodeQwen1.5:7b β ALIBABA'S CODE MODEL
OLLAMA_MODEL=codeqwen:7b
- Parameters: 7B
- Download: ~4.2GB
- RAM Required: 8GB
- Best For: Recipe parsing, ingredient analysis, structured data
- Recipe Quality: Very Good (85%)
- Features: Excellent at understanding structured recipe formats
Mistral-Nemo:12b β BALANCED CHOICE
OLLAMA_MODEL=mistral-nemo:12b
- Parameters: 12B
- Download: ~7GB
- RAM Required: 12GB
- Best For: General conversation with good reasoning
- Recipe Quality: Very Good (84%)
- Features: Multilingual, efficient, well-balanced
Nous-Hermes2:10.7b β FINE-TUNED EXCELLENCE
OLLAMA_MODEL=nous-hermes2:10.7b
- Parameters: 10.7B
- Download: ~6.4GB
- RAM Required: 12GB
- Best For: Instruction following, detailed responses
- Recipe Quality: Very Good (83%)
- Features: Excellent instruction following, helpful responses
OpenHermes2.5-Mistral:7b β COMMUNITY FAVORITE
OLLAMA_MODEL=openhermes2.5-mistral:7b
- Parameters: 7B
- Download: ~4.1GB
- RAM Required: 8GB
- Best For: Creative recipe suggestions, conversational AI
- Recipe Quality: Good (81%)
- Features: Creative, conversational, reliable
Solar:10.7b β UPSTAGE'S MODEL
OLLAMA_MODEL=solar:10.7b
- Parameters: 10.7B
- Download: ~6.1GB
- RAM Required: 12GB
- Best For: Analytical tasks, recipe modifications
- Recipe Quality: Very Good (83%)
- Features: Strong analytical capabilities, detailed explanations
Anthropic Claude Models
Claude 3.5 Sonnet (Production Standard)
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
- Pricing: $3/1M input, $15/1M output tokens
- Monthly Cost: ~$45 for 30K queries
- Best For: Balanced performance and reasoning
- Recipe Quality: Outstanding (94%)
- Features: Advanced analysis, code understanding
Claude 3.5 Haiku (Speed Focused)
ANTHROPIC_MODEL=claude-3-5-haiku-20241022
- Pricing: $0.25/1M input, $1.25/1M output tokens
- Monthly Cost: ~$4 for 30K queries
- Best For: Fast, cost-effective responses
- Recipe Quality: Very Good (87%)
- Features: Lightning fast, good quality
Claude 3 Opus (Premium Reasoning)
ANTHROPIC_MODEL=claude-3-opus-20240229
- Pricing: $15/1M input, $75/1M output tokens
- Monthly Cost: ~$225 for 30K queries
- Best For: Complex reasoning, highest quality
- Recipe Quality: Outstanding (95%)
- Features: Top-tier reasoning, complex tasks
π― Scenario-Based Recommendations
π¨βπ» Development & Testing
Choice: Gemini 2.5 Flash
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
- Free tier covers most development
- Excellent quality for testing
- Easy setup and integration
π Small to Medium Production
Choice: Gemini 2.5 Flash or GPT-4o-mini
# Cost-focused
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
# Quality-focused
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-4o-mini
π Self-Hosted
Choice: Llama 3.1:8b or upgrade to DeepSeek-R1:7b
# Your current (excellent choice)
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
# Upgrade option (better reasoning)
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b
π° Budget/Free
Choice: Local models or GPT-5-nano
# Best local alternative
LLM_PROVIDER=ollama
OLLAMA_MODEL=codeqwen:7b
# Best budget paid option
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano
# Quality budget cloud
LLM_PROVIDER=anthropic
ANTHROPIC_MODEL=claude-3-5-haiku-20241022
π Privacy/Offline
Choice: DeepSeek-R1:7b or Gemma 3:4b
# Best reasoning
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b
# Resource-efficient
LLM_PROVIDER=ollama
OLLAMA_MODEL=gemma3:4b
β‘ Quick Setup Commands
Cloud Models (Instant Setup)
Gemini 2.5 Flash (Recommended)
# Update .env
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.7
GOOGLE_MAX_TOKENS=1000
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('β
Gemini 2.5 Flash ready!')
response = service.simple_chat_completion('Suggest a quick pasta recipe')
print(f'Response: {response[:100]}...')
"
CodeQwen1.5:7b (Structured Data Expert)
# Pull model
ollama pull codeqwen:7b
# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=codeqwen:7b
OLLAMA_TEMPERATURE=0.7
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('β
CodeQwen 1.5:7b ready!')
response = service.simple_chat_completion('Parse this recipe: 2 cups flour, 1 egg, 1 cup milk')
print(f'Response: {response[:100]}...')
"
Mistral-Nemo:12b (Balanced Performance)
# Pull model
ollama pull mistral-nemo:12b
# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=mistral-nemo:12b
OLLAMA_TEMPERATURE=0.7
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('β
Mistral-Nemo ready!')
response = service.simple_chat_completion('Suggest a Mediterranean dinner menu')
print(f'Response: {response[:100]}...')
"
Claude 3.5 Haiku (Speed + Quality)
# Update .env
LLM_PROVIDER=anthropic
ANTHROPIC_MODEL=claude-3-5-haiku-20241022
ANTHROPIC_TEMPERATURE=0.7
ANTHROPIC_MAX_TOKENS=1000
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('β
Claude 3.5 Haiku ready!')
response = service.simple_chat_completion('Quick dinner ideas with vegetables')
print(f'Response: {response[:100]}...')
"
GPT-5-nano (Budget Winner)
# Update .env
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('β
GPT-5-nano ready!')
response = service.simple_chat_completion('Quick healthy breakfast ideas')
print(f'Response: {response[:100]}...')
"
GPT-5 (Premium)
# Update .env
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('β
GPT-5 ready!')
response = service.simple_chat_completion('Create a healthy meal plan')
print(f'Response: {response[:100]}...')
"
Self-Hosted Models
DeepSeek-R1:7b (Latest Breakthrough)
# Pull model
ollama pull deepseek-r1:7b
# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b
OLLAMA_TEMPERATURE=0.7
# Start Ollama
ollama serve &
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('β
DeepSeek-R1 ready!')
response = service.simple_chat_completion('Explain the science behind sourdough fermentation')
print(f'Response: {response[:100]}...')
"
Gemma 3:4b (Efficient)
# Pull model
ollama pull gemma3:4b
# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=gemma3:4b
OLLAMA_TEMPERATURE=0.7
# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('β
Gemma 3:4b ready!')
response = service.simple_chat_completion('Quick chicken recipes for weeknight dinners')
print(f'Response: {response[:100]}...')
"
π§ Hardware Requirements
Cloud Models
- Requirements: Internet connection, API key
- RAM: Any (processing done remotely)
- Storage: Minimal
- Best For: Instant setup, no hardware constraints
Self-Hosted Requirements
| Model | Parameters | RAM Needed | Storage | GPU Beneficial | Best For |
|---|---|---|---|---|---|
gemma3:4b |
4B | 6GB | 3.3GB | Optional | Laptops, modest hardware |
codeqwen:7b |
7B | 8GB | 4.2GB | Yes | Structured data, parsing |
llama3.1:8b |
8B | 8GB | 4.7GB | Yes | Standard workstations |
deepseek-r1:7b |
7B | 8GB | 4.7GB | Yes | Reasoning tasks |
openhermes2.5-mistral:7b |
7B | 8GB | 4.1GB | Yes | Conversational AI |
nous-hermes2:10.7b |
10.7B | 12GB | 6.4GB | Recommended | Instruction following |
mistral-nemo:12b |
12B | 12GB | 7GB | Recommended | Balanced performance |
phi4:14b |
14B | 16GB | 9.1GB | Recommended | High-end workstations |
gemma3:27b |
27B | 32GB | 17GB | Required | Powerful servers |