Spaces:

jessejohnson
/

plg4-dev-server

Paused

App Files Files Community

plg4-dev-server / backend /docs /model-selection-guide.md

Jesse Johnson

New commit for backend deployment: 2025-09-25_13-24-03

c59d808 5 months ago

preview code

raw

history blame contribute delete

13.2 kB

	# Model Selection Guide

	## 🎯 At-a-Glance Recommendations

	\| Priority \| Best Choice \| Provider \| Monthly Cost* \| Setup Time \| Quality Score \| Why Choose This \|
	\|----------\|-------------\|----------\|---------------\|------------\|---------------\|-----------------\|
	\| Ease of Use \| Gemini 2.5 Flash \| Google \| Free - $2 \| 2 min \| 90% \| Excellent free tier \|
	\| Best Value \| GPT-5-nano \| OpenAI \| $1.00 \| 2 min \| 88% \| Modern GPT-5 at nano price \|
	\| Premium Quality \| Claude 3 Opus \| Anthropic \| $225 \| 2 min \| 95% \| Highest reasoning quality \|
	\| Self-Hosted \| Llama 3.1:8b \| Ollama \| Free \| 10 min \| 82% \| Perfect balance \|
	\| High-End Local \| DeepSeek-R1:7b \| Ollama \| Free \| 15 min \| 88% \| Best reasoning model \|
	\| Budget Cloud \| Claude 3.5 Haiku \| Anthropic \| $4 \| 2 min \| 87% \| Fast and affordable \|
	\| Alternative Local \| CodeQwen1.5:7b \| Ollama \| Free \| 10 min \| 85% \| Excellent for structured data \|

	*Based on 30,000 queries/month

	---

	## 🏢 Cloud Models (Closed Source)

	### OpenAI Models

	#### GPT-5 (Latest Flagship) ⭐ NEW
	```bash
	OPENAI_MODEL=gpt-5
	```
	- Pricing: $20/month (Plus plan) - Unlimited with guardrails
	- Capabilities: Advanced reasoning, thinking, code execution
	- Best For: Premium applications requiring cutting-edge AI
	- Recipe Quality: Outstanding (96%) - Best culinary understanding
	- Context: 196K tokens (reasoning mode)


	#### GPT-5-nano (Ultra Budget) ⭐ MISSED GEM
	```bash
	OPENAI_MODEL=gpt-5-nano
	```
	- Pricing: $0.05/1M input, $0.40/1M output tokens
	- Monthly Cost: ~$1.00 for 30K queries
	- Best For: Budget-conscious deployments with modern capabilities
	- Recipe Quality: Very Good (88%)
	- Speed: Very Fast
	- Features: GPT-5 architecture at nano pricing


	#### GPT-4o-mini (Proven Budget Choice)
	```bash
	OPENAI_MODEL=gpt-4o-mini
	```
	- Pricing: $0.15/1M input, $0.60/1M output tokens
	- Monthly Cost: ~$4 for 30K queries
	- Best For: Cost-effective production deployments
	- Recipe Quality: Very Good (86%)
	- Speed: Very Fast


	### Google AI (Gemini) Models

	#### Gemini 2.5 Flash ⭐ RECOMMENDED
	```bash
	GOOGLE_MODEL=gemini-2.5-flash
	```
	- Pricing: Free tier, then $0.30/1M input, $2.50/1M output
	- Monthly Cost: Free - $2 for most usage patterns
	- Best For: Development and cost-conscious production
	- Recipe Quality: Excellent (90%)
	- Features: Thinking budgets, 1M context window

	#### Gemini 2.5 Pro (High-End)
	```bash
	GOOGLE_MODEL=gemini-2.5-pro
	```
	- Pricing: $1.25/1M input, $10/1M output (≤200K context)
	- Monthly Cost: ~$25 for 30K queries
	- Best For: Premium applications requiring best Google AI
	- Recipe Quality: Excellent (92%)

	#### Gemini 2.0 Flash-Lite (Ultra Budget)
	```bash
	GOOGLE_MODEL=gemini-2.0-flash-lite
	```
	- Pricing: $0.075/1M input, $0.30/1M output
	- Monthly Cost: ~$0.90 for 30K queries
	- Best For: High-volume, cost-sensitive applications
	- Recipe Quality: Good (85%)


	## 🔓 Open Source Models (Self-Hosted)

	### Ollama Models (Latest Releases)

	#### DeepSeek-R1:7b ⭐ BREAKTHROUGH MODEL
	```bash
	OLLAMA_MODEL=deepseek-r1:7b
	```
	- Parameters: 7B
	- Download: ~4.7GB
	- RAM Required: 8GB
	- Best For: Advanced reasoning tasks, O1-level performance
	- Recipe Quality: Outstanding (88%)
	- Special: Chain-of-thought reasoning, approaching GPT-4 performance

	#### Gemma 3:27b ⭐ NEW FLAGSHIP
	```bash
	OLLAMA_MODEL=gemma3:27b
	```
	- Parameters: 27B
	- Download: ~17GB
	- RAM Required: 32GB
	- Best For: Highest quality open source experience
	- Recipe Quality: Outstanding (89%)
	- Features: Vision capabilities, state-of-the-art performance

	#### Llama 3.1:8b (Proven Choice)
	```bash
	OLLAMA_MODEL=llama3.1:8b
	```
	- Parameters: 8B
	- Download: ~4.7GB
	- RAM Required: 8GB
	- Best For: Balanced production deployment
	- Recipe Quality: Very Good (82%)
	- Status: Your current choice - excellent balance!

	#### Qwen 3:8b ⭐ NEW RELEASE
	```bash
	OLLAMA_MODEL=qwen3:8b
	```
	- Parameters: 8B
	- Download: ~4.4GB
	- RAM Required: 8GB
	- Best For: Multilingual support, latest technology
	- Recipe Quality: Very Good (84%)
	- Features: Tool use, thinking capabilities

	#### Phi 4:14b ⭐ MICROSOFT'S LATEST
	```bash
	OLLAMA_MODEL=phi4:14b
	```
	- Parameters: 14B
	- Download: ~9.1GB
	- RAM Required: 16GB
	- Best For: Reasoning and math tasks
	- Recipe Quality: Very Good (85%)
	- Features: State-of-the-art efficiency

	#### Gemma 3:4b (Efficient Choice)
	```bash
	OLLAMA_MODEL=gemma3:4b
	```
	- Parameters: 4B
	- Download: ~3.3GB
	- RAM Required: 6GB
	- Best For: Resource-constrained deployments
	- Recipe Quality: Good (78%)
	- Features: Excellent for size, runs on modest hardware

	### HuggingFace Models (Downloadable for Local Use)

	#### CodeQwen1.5:7b ⭐ ALIBABA'S CODE MODEL
	```bash
	OLLAMA_MODEL=codeqwen:7b
	```
	- Parameters: 7B
	- Download: ~4.2GB
	- RAM Required: 8GB
	- Best For: Recipe parsing, ingredient analysis, structured data
	- Recipe Quality: Very Good (85%)
	- Features: Excellent at understanding structured recipe formats

	#### Mistral-Nemo:12b ⭐ BALANCED CHOICE
	```bash
	OLLAMA_MODEL=mistral-nemo:12b
	```
	- Parameters: 12B
	- Download: ~7GB
	- RAM Required: 12GB
	- Best For: General conversation with good reasoning
	- Recipe Quality: Very Good (84%)
	- Features: Multilingual, efficient, well-balanced

	#### Nous-Hermes2:10.7b ⭐ FINE-TUNED EXCELLENCE
	```bash
	OLLAMA_MODEL=nous-hermes2:10.7b
	```
	- Parameters: 10.7B
	- Download: ~6.4GB
	- RAM Required: 12GB
	- Best For: Instruction following, detailed responses
	- Recipe Quality: Very Good (83%)
	- Features: Excellent instruction following, helpful responses

	#### OpenHermes2.5-Mistral:7b ⭐ COMMUNITY FAVORITE
	```bash
	OLLAMA_MODEL=openhermes2.5-mistral:7b
	```
	- Parameters: 7B
	- Download: ~4.1GB
	- RAM Required: 8GB
	- Best For: Creative recipe suggestions, conversational AI
	- Recipe Quality: Good (81%)
	- Features: Creative, conversational, reliable

	#### Solar:10.7b ⭐ UPSTAGE'S MODEL
	```bash
	OLLAMA_MODEL=solar:10.7b
	```
	- Parameters: 10.7B
	- Download: ~6.1GB
	- RAM Required: 12GB
	- Best For: Analytical tasks, recipe modifications
	- Recipe Quality: Very Good (83%)
	- Features: Strong analytical capabilities, detailed explanations


	### Anthropic Claude Models

	#### Claude 3.5 Sonnet (Production Standard)
	```bash
	ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
	```
	- Pricing: $3/1M input, $15/1M output tokens
	- Monthly Cost: ~$45 for 30K queries
	- Best For: Balanced performance and reasoning
	- Recipe Quality: Outstanding (94%)
	- Features: Advanced analysis, code understanding

	#### Claude 3.5 Haiku (Speed Focused)
	```bash
	ANTHROPIC_MODEL=claude-3-5-haiku-20241022
	```
	- Pricing: $0.25/1M input, $1.25/1M output tokens
	- Monthly Cost: ~$4 for 30K queries
	- Best For: Fast, cost-effective responses
	- Recipe Quality: Very Good (87%)
	- Features: Lightning fast, good quality

	#### Claude 3 Opus (Premium Reasoning)
	```bash
	ANTHROPIC_MODEL=claude-3-opus-20240229
	```
	- Pricing: $15/1M input, $75/1M output tokens
	- Monthly Cost: ~$225 for 30K queries
	- Best For: Complex reasoning, highest quality
	- Recipe Quality: Outstanding (95%)
	- Features: Top-tier reasoning, complex tasks

	---


	## 🎯 Scenario-Based Recommendations

	### 👨‍💻 Development & Testing
	Choice: Gemini 2.5 Flash
	```bash
	LLM_PROVIDER=google
	GOOGLE_MODEL=gemini-2.5-flash
	```
	- Free tier covers most development
	- Excellent quality for testing
	- Easy setup and integration

	### 🚀 Small to Medium Production
	Choice: Gemini 2.5 Flash or GPT-4o-mini
	```bash
	# Cost-focused
	LLM_PROVIDER=google
	GOOGLE_MODEL=gemini-2.5-flash

	# Quality-focused
	LLM_PROVIDER=openai
	OPENAI_MODEL=gpt-4o-mini
	```

	### 🏠 Self-Hosted
	Choice: Llama 3.1:8b or upgrade to DeepSeek-R1:7b
	```bash
	# Your current (excellent choice)
	LLM_PROVIDER=ollama
	OLLAMA_MODEL=llama3.1:8b

	# Upgrade option (better reasoning)
	LLM_PROVIDER=ollama
	OLLAMA_MODEL=deepseek-r1:7b
	```

	### 💰 Budget/Free
	Choice: Local models or GPT-5-nano
	```bash
	# Best local alternative
	LLM_PROVIDER=ollama
	OLLAMA_MODEL=codeqwen:7b

	# Best budget paid option
	LLM_PROVIDER=openai
	OPENAI_MODEL=gpt-5-nano

	# Quality budget cloud
	LLM_PROVIDER=anthropic
	ANTHROPIC_MODEL=claude-3-5-haiku-20241022
	```

	### 🔒 Privacy/Offline
	Choice: DeepSeek-R1:7b or Gemma 3:4b
	```bash
	# Best reasoning
	LLM_PROVIDER=ollama
	OLLAMA_MODEL=deepseek-r1:7b

	# Resource-efficient
	LLM_PROVIDER=ollama
	OLLAMA_MODEL=gemma3:4b
	```

	---

	## ⚡ Quick Setup Commands

	### Cloud Models (Instant Setup)

	#### Gemini 2.5 Flash (Recommended)
	```bash
	# Update .env
	LLM_PROVIDER=google
	GOOGLE_MODEL=gemini-2.5-flash
	GOOGLE_TEMPERATURE=0.7
	GOOGLE_MAX_TOKENS=1000

	# Test
	python -c "
	from services.llm_service import LLMService
	service = LLMService()
	print('✅ Gemini 2.5 Flash ready!')
	response = service.simple_chat_completion('Suggest a quick pasta recipe')
	print(f'Response: {response[:100]}...')
	"
	```

	#### CodeQwen1.5:7b (Structured Data Expert)
	```bash
	# Pull model
	ollama pull codeqwen:7b

	# Update .env
	LLM_PROVIDER=ollama
	OLLAMA_MODEL=codeqwen:7b
	OLLAMA_TEMPERATURE=0.7

	# Test
	python -c "
	from services.llm_service import LLMService
	service = LLMService()
	print('✅ CodeQwen 1.5:7b ready!')
	response = service.simple_chat_completion('Parse this recipe: 2 cups flour, 1 egg, 1 cup milk')
	print(f'Response: {response[:100]}...')
	"
	```

	#### Mistral-Nemo:12b (Balanced Performance)
	```bash
	# Pull model
	ollama pull mistral-nemo:12b

	# Update .env
	LLM_PROVIDER=ollama
	OLLAMA_MODEL=mistral-nemo:12b
	OLLAMA_TEMPERATURE=0.7

	# Test
	python -c "
	from services.llm_service import LLMService
	service = LLMService()
	print('✅ Mistral-Nemo ready!')
	response = service.simple_chat_completion('Suggest a Mediterranean dinner menu')
	print(f'Response: {response[:100]}...')
	"
	```

	#### Claude 3.5 Haiku (Speed + Quality)
	```bash
	# Update .env
	LLM_PROVIDER=anthropic
	ANTHROPIC_MODEL=claude-3-5-haiku-20241022
	ANTHROPIC_TEMPERATURE=0.7
	ANTHROPIC_MAX_TOKENS=1000

	# Test
	python -c "
	from services.llm_service import LLMService
	service = LLMService()
	print('✅ Claude 3.5 Haiku ready!')
	response = service.simple_chat_completion('Quick dinner ideas with vegetables')
	print(f'Response: {response[:100]}...')
	"
	```

	#### GPT-5-nano (Budget Winner)
	```bash
	# Update .env
	LLM_PROVIDER=openai
	OPENAI_MODEL=gpt-5-nano
	OPENAI_TEMPERATURE=0.7
	OPENAI_MAX_TOKENS=1000

	# Test
	python -c "
	from services.llm_service import LLMService
	service = LLMService()
	print('✅ GPT-5-nano ready!')
	response = service.simple_chat_completion('Quick healthy breakfast ideas')
	print(f'Response: {response[:100]}...')
	"
	```

	#### GPT-5 (Premium)
	```bash
	# Update .env
	LLM_PROVIDER=openai
	OPENAI_MODEL=gpt-5
	OPENAI_TEMPERATURE=0.7
	OPENAI_MAX_TOKENS=1000

	# Test
	python -c "
	from services.llm_service import LLMService
	service = LLMService()
	print('✅ GPT-5 ready!')
	response = service.simple_chat_completion('Create a healthy meal plan')
	print(f'Response: {response[:100]}...')
	"
	```

	### Self-Hosted Models

	#### DeepSeek-R1:7b (Latest Breakthrough)
	```bash
	# Pull model
	ollama pull deepseek-r1:7b

	# Update .env
	LLM_PROVIDER=ollama
	OLLAMA_MODEL=deepseek-r1:7b
	OLLAMA_TEMPERATURE=0.7

	# Start Ollama
	ollama serve &

	# Test
	python -c "
	from services.llm_service import LLMService
	service = LLMService()
	print('✅ DeepSeek-R1 ready!')
	response = service.simple_chat_completion('Explain the science behind sourdough fermentation')
	print(f'Response: {response[:100]}...')
	"
	```

	#### Gemma 3:4b (Efficient)
	```bash
	# Pull model
	ollama pull gemma3:4b

	# Update .env
	LLM_PROVIDER=ollama
	OLLAMA_MODEL=gemma3:4b
	OLLAMA_TEMPERATURE=0.7

	# Test
	python -c "
	from services.llm_service import LLMService
	service = LLMService()
	print('✅ Gemma 3:4b ready!')
	response = service.simple_chat_completion('Quick chicken recipes for weeknight dinners')
	print(f'Response: {response[:100]}...')
	"
	```

	---

	## 🔧 Hardware Requirements

	### Cloud Models
	- Requirements: Internet connection, API key
	- RAM: Any (processing done remotely)
	- Storage: Minimal
	- Best For: Instant setup, no hardware constraints

	### Self-Hosted Requirements

	\| Model \| Parameters \| RAM Needed \| Storage \| GPU Beneficial \| Best For \|
	\|-------\|------------\|------------\|---------\|----------------\|----------\|
	\| `gemma3:4b` \| 4B \| 6GB \| 3.3GB \| Optional \| Laptops, modest hardware \|
	\| `codeqwen:7b` \| 7B \| 8GB \| 4.2GB \| Yes \| Structured data, parsing \|
	\| `llama3.1:8b` \| 8B \| 8GB \| 4.7GB \| Yes \| Standard workstations \|
	\| `deepseek-r1:7b` \| 7B \| 8GB \| 4.7GB \| Yes \| Reasoning tasks \|
	\| `openhermes2.5-mistral:7b` \| 7B \| 8GB \| 4.1GB \| Yes \| Conversational AI \|
	\| `nous-hermes2:10.7b` \| 10.7B \| 12GB \| 6.4GB \| Recommended \| Instruction following \|
	\| `mistral-nemo:12b` \| 12B \| 12GB \| 7GB \| Recommended \| Balanced performance \|
	\| `phi4:14b` \| 14B \| 16GB \| 9.1GB \| Recommended \| High-end workstations \|
	\| `gemma3:27b` \| 27B \| 32GB \| 17GB \| Required \| Powerful servers \|

	---