Spaces:

jessejohnson
/

plg4-dev-server

Paused

App Files Files Community

plg4-dev-server / backend /docs /model-selection-guide.md

Jesse Johnson

New commit for backend deployment: 2025-09-25_13-24-03

c59d808 5 months ago

preview code

raw

history blame contribute delete

13.2 kB

Model Selection Guide

🎯 At-a-Glance Recommendations

Priority	Best Choice	Provider	Monthly Cost*	Setup Time	Quality Score	Why Choose This
Ease of Use	Gemini 2.5 Flash	Google	Free - $2	2 min	90%	Excellent free tier
Best Value	GPT-5-nano	OpenAI	$1.00	2 min	88%	Modern GPT-5 at nano price
Premium Quality	Claude 3 Opus	Anthropic	$225	2 min	95%	Highest reasoning quality
Self-Hosted	Llama 3.1:8b	Ollama	Free	10 min	82%	Perfect balance
High-End Local	DeepSeek-R1:7b	Ollama	Free	15 min	88%	Best reasoning model
Budget Cloud	Claude 3.5 Haiku	Anthropic	$4	2 min	87%	Fast and affordable
Alternative Local	CodeQwen1.5:7b	Ollama	Free	10 min	85%	Excellent for structured data

*Based on 30,000 queries/month

🏢 Cloud Models (Closed Source)

OpenAI Models

GPT-5 (Latest Flagship) ⭐ NEW

OPENAI_MODEL=gpt-5

Pricing: $20/month (Plus plan) - Unlimited with guardrails
Capabilities: Advanced reasoning, thinking, code execution
Best For: Premium applications requiring cutting-edge AI
Recipe Quality: Outstanding (96%) - Best culinary understanding
Context: 196K tokens (reasoning mode)

GPT-5-nano (Ultra Budget) ⭐ MISSED GEM

OPENAI_MODEL=gpt-5-nano

Pricing: $0.05/1M input, $0.40/1M output tokens
Monthly Cost: ~$1.00 for 30K queries
Best For: Budget-conscious deployments with modern capabilities
Recipe Quality: Very Good (88%)
Speed: Very Fast
Features: GPT-5 architecture at nano pricing

GPT-4o-mini (Proven Budget Choice)

OPENAI_MODEL=gpt-4o-mini

Pricing: $0.15/1M input, $0.60/1M output tokens
Monthly Cost: ~$4 for 30K queries
Best For: Cost-effective production deployments
Recipe Quality: Very Good (86%)
Speed: Very Fast

Google AI (Gemini) Models

Gemini 2.5 Flash ⭐ RECOMMENDED

GOOGLE_MODEL=gemini-2.5-flash

Pricing: Free tier, then $0.30/1M input, $2.50/1M output
Monthly Cost: Free - $2 for most usage patterns
Best For: Development and cost-conscious production
Recipe Quality: Excellent (90%)
Features: Thinking budgets, 1M context window

Gemini 2.5 Pro (High-End)

GOOGLE_MODEL=gemini-2.5-pro

Pricing: $1.25/1M input, $10/1M output (≤200K context)
Monthly Cost: ~$25 for 30K queries
Best For: Premium applications requiring best Google AI
Recipe Quality: Excellent (92%)

Gemini 2.0 Flash-Lite (Ultra Budget)

GOOGLE_MODEL=gemini-2.0-flash-lite

Pricing: $0.075/1M input, $0.30/1M output
Monthly Cost: ~$0.90 for 30K queries
Best For: High-volume, cost-sensitive applications
Recipe Quality: Good (85%)

🔓 Open Source Models (Self-Hosted)

Ollama Models (Latest Releases)

DeepSeek-R1:7b ⭐ BREAKTHROUGH MODEL

OLLAMA_MODEL=deepseek-r1:7b

Parameters: 7B
Download: ~4.7GB
RAM Required: 8GB
Best For: Advanced reasoning tasks, O1-level performance
Recipe Quality: Outstanding (88%)
Special: Chain-of-thought reasoning, approaching GPT-4 performance

Gemma 3:27b ⭐ NEW FLAGSHIP

OLLAMA_MODEL=gemma3:27b

Parameters: 27B
Download: ~17GB
RAM Required: 32GB
Best For: Highest quality open source experience
Recipe Quality: Outstanding (89%)
Features: Vision capabilities, state-of-the-art performance

Llama 3.1:8b (Proven Choice)

OLLAMA_MODEL=llama3.1:8b

Parameters: 8B
Download: ~4.7GB
RAM Required: 8GB
Best For: Balanced production deployment
Recipe Quality: Very Good (82%)
Status: Your current choice - excellent balance!

Qwen 3:8b ⭐ NEW RELEASE

OLLAMA_MODEL=qwen3:8b

Parameters: 8B
Download: ~4.4GB
RAM Required: 8GB
Best For: Multilingual support, latest technology
Recipe Quality: Very Good (84%)
Features: Tool use, thinking capabilities

Phi 4:14b ⭐ MICROSOFT'S LATEST

OLLAMA_MODEL=phi4:14b

Parameters: 14B
Download: ~9.1GB
RAM Required: 16GB
Best For: Reasoning and math tasks
Recipe Quality: Very Good (85%)
Features: State-of-the-art efficiency

Gemma 3:4b (Efficient Choice)

OLLAMA_MODEL=gemma3:4b

Parameters: 4B
Download: ~3.3GB
RAM Required: 6GB
Best For: Resource-constrained deployments
Recipe Quality: Good (78%)
Features: Excellent for size, runs on modest hardware

HuggingFace Models (Downloadable for Local Use)

CodeQwen1.5:7b ⭐ ALIBABA'S CODE MODEL

OLLAMA_MODEL=codeqwen:7b

Parameters: 7B
Download: ~4.2GB
RAM Required: 8GB
Best For: Recipe parsing, ingredient analysis, structured data
Recipe Quality: Very Good (85%)
Features: Excellent at understanding structured recipe formats

Mistral-Nemo:12b ⭐ BALANCED CHOICE

OLLAMA_MODEL=mistral-nemo:12b

Parameters: 12B
Download: ~7GB
RAM Required: 12GB
Best For: General conversation with good reasoning
Recipe Quality: Very Good (84%)
Features: Multilingual, efficient, well-balanced

Nous-Hermes2:10.7b ⭐ FINE-TUNED EXCELLENCE

OLLAMA_MODEL=nous-hermes2:10.7b

Parameters: 10.7B
Download: ~6.4GB
RAM Required: 12GB
Best For: Instruction following, detailed responses
Recipe Quality: Very Good (83%)
Features: Excellent instruction following, helpful responses

OpenHermes2.5-Mistral:7b ⭐ COMMUNITY FAVORITE

OLLAMA_MODEL=openhermes2.5-mistral:7b

Parameters: 7B
Download: ~4.1GB
RAM Required: 8GB
Best For: Creative recipe suggestions, conversational AI
Recipe Quality: Good (81%)
Features: Creative, conversational, reliable

Solar:10.7b ⭐ UPSTAGE'S MODEL

OLLAMA_MODEL=solar:10.7b

Parameters: 10.7B
Download: ~6.1GB
RAM Required: 12GB
Best For: Analytical tasks, recipe modifications
Recipe Quality: Very Good (83%)
Features: Strong analytical capabilities, detailed explanations

Anthropic Claude Models

Claude 3.5 Sonnet (Production Standard)

ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

Pricing: $3/1M input, $15/1M output tokens
Monthly Cost: ~$45 for 30K queries
Best For: Balanced performance and reasoning
Recipe Quality: Outstanding (94%)
Features: Advanced analysis, code understanding

Claude 3.5 Haiku (Speed Focused)

ANTHROPIC_MODEL=claude-3-5-haiku-20241022

Pricing: $0.25/1M input, $1.25/1M output tokens
Monthly Cost: ~$4 for 30K queries
Best For: Fast, cost-effective responses
Recipe Quality: Very Good (87%)
Features: Lightning fast, good quality

Claude 3 Opus (Premium Reasoning)

ANTHROPIC_MODEL=claude-3-opus-20240229

Pricing: $15/1M input, $75/1M output tokens
Monthly Cost: ~$225 for 30K queries
Best For: Complex reasoning, highest quality
Recipe Quality: Outstanding (95%)
Features: Top-tier reasoning, complex tasks

🎯 Scenario-Based Recommendations

👨‍💻 Development & Testing

Choice: Gemini 2.5 Flash

LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash

Free tier covers most development
Excellent quality for testing
Easy setup and integration

🚀 Small to Medium Production

Choice: Gemini 2.5 Flash or GPT-4o-mini

# Cost-focused
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash

# Quality-focused
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-4o-mini

🏠 Self-Hosted

Choice: Llama 3.1:8b or upgrade to DeepSeek-R1:7b

# Your current (excellent choice)
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b

# Upgrade option (better reasoning)
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b

💰 Budget/Free

Choice: Local models or GPT-5-nano

# Best local alternative
LLM_PROVIDER=ollama
OLLAMA_MODEL=codeqwen:7b

# Best budget paid option
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano

# Quality budget cloud
LLM_PROVIDER=anthropic
ANTHROPIC_MODEL=claude-3-5-haiku-20241022

🔒 Privacy/Offline

Choice: DeepSeek-R1:7b or Gemma 3:4b

# Best reasoning
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b

# Resource-efficient
LLM_PROVIDER=ollama
OLLAMA_MODEL=gemma3:4b

⚡ Quick Setup Commands

Cloud Models (Instant Setup)

Gemini 2.5 Flash (Recommended)

# Update .env
LLM_PROVIDER=google
GOOGLE_MODEL=gemini-2.5-flash
GOOGLE_TEMPERATURE=0.7
GOOGLE_MAX_TOKENS=1000

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('✅ Gemini 2.5 Flash ready!')
response = service.simple_chat_completion('Suggest a quick pasta recipe')
print(f'Response: {response[:100]}...')
"

CodeQwen1.5:7b (Structured Data Expert)

# Pull model
ollama pull codeqwen:7b

# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=codeqwen:7b
OLLAMA_TEMPERATURE=0.7

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('✅ CodeQwen 1.5:7b ready!')
response = service.simple_chat_completion('Parse this recipe: 2 cups flour, 1 egg, 1 cup milk')
print(f'Response: {response[:100]}...')
"

Mistral-Nemo:12b (Balanced Performance)

# Pull model
ollama pull mistral-nemo:12b

# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=mistral-nemo:12b
OLLAMA_TEMPERATURE=0.7

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('✅ Mistral-Nemo ready!')
response = service.simple_chat_completion('Suggest a Mediterranean dinner menu')
print(f'Response: {response[:100]}...')
"

Claude 3.5 Haiku (Speed + Quality)

# Update .env
LLM_PROVIDER=anthropic
ANTHROPIC_MODEL=claude-3-5-haiku-20241022
ANTHROPIC_TEMPERATURE=0.7
ANTHROPIC_MAX_TOKENS=1000

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('✅ Claude 3.5 Haiku ready!')
response = service.simple_chat_completion('Quick dinner ideas with vegetables')
print(f'Response: {response[:100]}...')
"

GPT-5-nano (Budget Winner)

# Update .env
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5-nano
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('✅ GPT-5-nano ready!')
response = service.simple_chat_completion('Quick healthy breakfast ideas')
print(f'Response: {response[:100]}...')
"

GPT-5 (Premium)

# Update .env
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-5
OPENAI_TEMPERATURE=0.7
OPENAI_MAX_TOKENS=1000

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('✅ GPT-5 ready!')
response = service.simple_chat_completion('Create a healthy meal plan')
print(f'Response: {response[:100]}...')
"

Self-Hosted Models

DeepSeek-R1:7b (Latest Breakthrough)

# Pull model
ollama pull deepseek-r1:7b

# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=deepseek-r1:7b
OLLAMA_TEMPERATURE=0.7

# Start Ollama
ollama serve &

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('✅ DeepSeek-R1 ready!')
response = service.simple_chat_completion('Explain the science behind sourdough fermentation')
print(f'Response: {response[:100]}...')
"

Gemma 3:4b (Efficient)

# Pull model
ollama pull gemma3:4b

# Update .env
LLM_PROVIDER=ollama
OLLAMA_MODEL=gemma3:4b
OLLAMA_TEMPERATURE=0.7

# Test
python -c "
from services.llm_service import LLMService
service = LLMService()
print('✅ Gemma 3:4b ready!')
response = service.simple_chat_completion('Quick chicken recipes for weeknight dinners')
print(f'Response: {response[:100]}...')
"

🔧 Hardware Requirements

Cloud Models

Requirements: Internet connection, API key
RAM: Any (processing done remotely)
Storage: Minimal
Best For: Instant setup, no hardware constraints

Self-Hosted Requirements

Model	Parameters	RAM Needed	Storage	GPU Beneficial	Best For
`gemma3:4b`	4B	6GB	3.3GB	Optional	Laptops, modest hardware
`codeqwen:7b`	7B	8GB	4.2GB	Yes	Structured data, parsing
`llama3.1:8b`	8B	8GB	4.7GB	Yes	Standard workstations
`deepseek-r1:7b`	7B	8GB	4.7GB	Yes	Reasoning tasks
`openhermes2.5-mistral:7b`	7B	8GB	4.1GB	Yes	Conversational AI
`nous-hermes2:10.7b`	10.7B	12GB	6.4GB	Recommended	Instruction following
`mistral-nemo:12b`	12B	12GB	7GB	Recommended	Balanced performance
`phi4:14b`	14B	16GB	9.1GB	Recommended	High-end workstations
`gemma3:27b`	27B	32GB	17GB	Required	Powerful servers