# Model Selection Guide ## 🎯 At-a-Glance Recommendations | Priority | Best Choice | Provider | Monthly Cost* | Setup Time | Quality Score | Why Choose This | |----------|-------------|----------|---------------|------------|---------------|-----------------| | **Ease of Use** | Gemini 2.5 Flash | Google | Free - $2 | 2 min | 90% | Excellent free tier | | **Best Value** | GPT-5-nano | OpenAI | $1.00 | 2 min | 88% | Modern GPT-5 at nano price | | **Premium Quality** | Claude 3 Opus | Anthropic | $225 | 2 min | 95% | Highest reasoning quality | | **Self-Hosted** | Llama 3.1:8b | Ollama | Free | 10 min | 82% | Perfect balance | | **High-End Local** | DeepSeek-R1:7b | Ollama | Free | 15 min | 88% | Best reasoning model | | **Budget Cloud** | Claude 3.5 Haiku | Anthropic | $4 | 2 min | 87% | Fast and affordable | | **Alternative Local** | CodeQwen1.5:7b | Ollama | Free | 10 min | 85% | Excellent for structured data | *Based on 30,000 queries/month --- ## 🏢 Cloud Models (Closed Source) ### OpenAI Models #### GPT-5 (Latest Flagship) ⭐ **NEW** ```bash OPENAI_MODEL=gpt-5 ``` - **Pricing**: $20/month (Plus plan) - Unlimited with guardrails - **Capabilities**: Advanced reasoning, thinking, code execution - **Best For**: Premium applications requiring cutting-edge AI - **Recipe Quality**: Outstanding (96%) - Best culinary understanding - **Context**: 196K tokens (reasoning mode) #### GPT-5-nano (Ultra Budget) ⭐ **MISSED GEM** ```bash OPENAI_MODEL=gpt-5-nano ``` - **Pricing**: $0.05/1M input, $0.40/1M output tokens - **Monthly Cost**: ~$1.00 for 30K queries - **Best For**: Budget-conscious deployments with modern capabilities - **Recipe Quality**: Very Good (88%) - **Speed**: Very Fast - **Features**: GPT-5 architecture at nano pricing #### GPT-4o-mini (Proven Budget Choice) ```bash OPENAI_MODEL=gpt-4o-mini ``` - **Pricing**: $0.15/1M input, $0.60/1M output tokens - **Monthly Cost**: ~$4 for 30K queries - **Best For**: Cost-effective production deployments - **Recipe Quality**: Very Good (86%) - **Speed**: Very Fast ### Google AI (Gemini) Models #### Gemini 2.5 Flash ⭐ **RECOMMENDED** ```bash GOOGLE_MODEL=gemini-2.5-flash ``` - **Pricing**: Free tier, then $0.30/1M input, $2.50/1M output - **Monthly Cost**: Free - $2 for most usage patterns - **Best For**: Development and cost-conscious production - **Recipe Quality**: Excellent (90%) - **Features**: Thinking budgets, 1M context window #### Gemini 2.5 Pro (High-End) ```bash GOOGLE_MODEL=gemini-2.5-pro ``` - **Pricing**: $1.25/1M input, $10/1M output (≤200K context) - **Monthly Cost**: ~$25 for 30K queries - **Best For**: Premium applications requiring best Google AI - **Recipe Quality**: Excellent (92%) #### Gemini 2.0 Flash-Lite (Ultra Budget) ```bash GOOGLE_MODEL=gemini-2.0-flash-lite ``` - **Pricing**: $0.075/1M input, $0.30/1M output - **Monthly Cost**: ~$0.90 for 30K queries - **Best For**: High-volume, cost-sensitive applications - **Recipe Quality**: Good (85%) ## 🔓 Open Source Models (Self-Hosted) ### Ollama Models (Latest Releases) #### DeepSeek-R1:7b ⭐ **BREAKTHROUGH MODEL** ```bash OLLAMA_MODEL=deepseek-r1:7b ``` - **Parameters**: 7B - **Download**: ~4.7GB - **RAM Required**: 8GB - **Best For**: Advanced reasoning tasks, O1-level performance - **Recipe Quality**: Outstanding (88%) - **Special**: Chain-of-thought reasoning, approaching GPT-4 performance #### Gemma 3:27b ⭐ **NEW FLAGSHIP** ```bash OLLAMA_MODEL=gemma3:27b ``` - **Parameters**: 27B - **Download**: ~17GB - **RAM Required**: 32GB - **Best For**: Highest quality open source experience - **Recipe Quality**: Outstanding (89%) - **Features**: Vision capabilities, state-of-the-art performance #### Llama 3.1:8b (Proven Choice) ```bash OLLAMA_MODEL=llama3.1:8b ``` - **Parameters**: 8B - **Download**: ~4.7GB - **RAM Required**: 8GB - **Best For**: Balanced production deployment - **Recipe Quality**: Very Good (82%) - **Status**: Your current choice - excellent balance! #### Qwen 3:8b ⭐ **NEW RELEASE** ```bash OLLAMA_MODEL=qwen3:8b ``` - **Parameters**: 8B - **Download**: ~4.4GB - **RAM Required**: 8GB - **Best For**: Multilingual support, latest technology - **Recipe Quality**: Very Good (84%) - **Features**: Tool use, thinking capabilities #### Phi 4:14b ⭐ **MICROSOFT'S LATEST** ```bash OLLAMA_MODEL=phi4:14b ``` - **Parameters**: 14B - **Download**: ~9.1GB - **RAM Required**: 16GB - **Best For**: Reasoning and math tasks - **Recipe Quality**: Very Good (85%) - **Features**: State-of-the-art efficiency #### Gemma 3:4b (Efficient Choice) ```bash OLLAMA_MODEL=gemma3:4b ``` - **Parameters**: 4B - **Download**: ~3.3GB - **RAM Required**: 6GB - **Best For**: Resource-constrained deployments - **Recipe Quality**: Good (78%) - **Features**: Excellent for size, runs on modest hardware ### HuggingFace Models (Downloadable for Local Use) #### CodeQwen1.5:7b ⭐ **ALIBABA'S CODE MODEL** ```bash OLLAMA_MODEL=codeqwen:7b ``` - **Parameters**: 7B - **Download**: ~4.2GB - **RAM Required**: 8GB - **Best For**: Recipe parsing, ingredient analysis, structured data - **Recipe Quality**: Very Good (85%) - **Features**: Excellent at understanding structured recipe formats #### Mistral-Nemo:12b ⭐ **BALANCED CHOICE** ```bash OLLAMA_MODEL=mistral-nemo:12b ``` - **Parameters**: 12B - **Download**: ~7GB - **RAM Required**: 12GB - **Best For**: General conversation with good reasoning - **Recipe Quality**: Very Good (84%) - **Features**: Multilingual, efficient, well-balanced #### Nous-Hermes2:10.7b ⭐ **FINE-TUNED EXCELLENCE** ```bash OLLAMA_MODEL=nous-hermes2:10.7b ``` - **Parameters**: 10.7B - **Download**: ~6.4GB - **RAM Required**: 12GB - **Best For**: Instruction following, detailed responses - **Recipe Quality**: Very Good (83%) - **Features**: Excellent instruction following, helpful responses #### OpenHermes2.5-Mistral:7b ⭐ **COMMUNITY FAVORITE** ```bash OLLAMA_MODEL=openhermes2.5-mistral:7b ``` - **Parameters**: 7B - **Download**: ~4.1GB - **RAM Required**: 8GB - **Best For**: Creative recipe suggestions, conversational AI - **Recipe Quality**: Good (81%) - **Features**: Creative, conversational, reliable #### Solar:10.7b ⭐ **UPSTAGE'S MODEL** ```bash OLLAMA_MODEL=solar:10.7b ``` - **Parameters**: 10.7B - **Download**: ~6.1GB - **RAM Required**: 12GB - **Best For**: Analytical tasks, recipe modifications - **Recipe Quality**: Very Good (83%) - **Features**: Strong analytical capabilities, detailed explanations ### Anthropic Claude Models #### Claude 3.5 Sonnet (Production Standard) ```bash ANTHROPIC_MODEL=claude-3-5-sonnet-20241022 ``` - **Pricing**: $3/1M input, $15/1M output tokens - **Monthly Cost**: ~$45 for 30K queries - **Best For**: Balanced performance and reasoning - **Recipe Quality**: Outstanding (94%) - **Features**: Advanced analysis, code understanding #### Claude 3.5 Haiku (Speed Focused) ```bash ANTHROPIC_MODEL=claude-3-5-haiku-20241022 ``` - **Pricing**: $0.25/1M input, $1.25/1M output tokens - **Monthly Cost**: ~$4 for 30K queries - **Best For**: Fast, cost-effective responses - **Recipe Quality**: Very Good (87%) - **Features**: Lightning fast, good quality #### Claude 3 Opus (Premium Reasoning) ```bash ANTHROPIC_MODEL=claude-3-opus-20240229 ``` - **Pricing**: $15/1M input, $75/1M output tokens - **Monthly Cost**: ~$225 for 30K queries - **Best For**: Complex reasoning, highest quality - **Recipe Quality**: Outstanding (95%) - **Features**: Top-tier reasoning, complex tasks --- ## 🎯 Scenario-Based Recommendations ### 👨‍💻 **Development & Testing** **Choice**: Gemini 2.5 Flash ```bash LLM_PROVIDER=google GOOGLE_MODEL=gemini-2.5-flash ``` - Free tier covers most development - Excellent quality for testing - Easy setup and integration ### 🚀 **Small to Medium Production** **Choice**: Gemini 2.5 Flash or GPT-4o-mini ```bash # Cost-focused LLM_PROVIDER=google GOOGLE_MODEL=gemini-2.5-flash # Quality-focused LLM_PROVIDER=openai OPENAI_MODEL=gpt-4o-mini ``` ### 🏠 **Self-Hosted** **Choice**: Llama 3.1:8b or upgrade to DeepSeek-R1:7b ```bash # Your current (excellent choice) LLM_PROVIDER=ollama OLLAMA_MODEL=llama3.1:8b # Upgrade option (better reasoning) LLM_PROVIDER=ollama OLLAMA_MODEL=deepseek-r1:7b ``` ### 💰 **Budget/Free** **Choice**: Local models or GPT-5-nano ```bash # Best local alternative LLM_PROVIDER=ollama OLLAMA_MODEL=codeqwen:7b # Best budget paid option LLM_PROVIDER=openai OPENAI_MODEL=gpt-5-nano # Quality budget cloud LLM_PROVIDER=anthropic ANTHROPIC_MODEL=claude-3-5-haiku-20241022 ``` ### 🔒 **Privacy/Offline** **Choice**: DeepSeek-R1:7b or Gemma 3:4b ```bash # Best reasoning LLM_PROVIDER=ollama OLLAMA_MODEL=deepseek-r1:7b # Resource-efficient LLM_PROVIDER=ollama OLLAMA_MODEL=gemma3:4b ``` --- ## ⚡ Quick Setup Commands ### Cloud Models (Instant Setup) #### Gemini 2.5 Flash (Recommended) ```bash # Update .env LLM_PROVIDER=google GOOGLE_MODEL=gemini-2.5-flash GOOGLE_TEMPERATURE=0.7 GOOGLE_MAX_TOKENS=1000 # Test python -c " from services.llm_service import LLMService service = LLMService() print('✅ Gemini 2.5 Flash ready!') response = service.simple_chat_completion('Suggest a quick pasta recipe') print(f'Response: {response[:100]}...') " ``` #### CodeQwen1.5:7b (Structured Data Expert) ```bash # Pull model ollama pull codeqwen:7b # Update .env LLM_PROVIDER=ollama OLLAMA_MODEL=codeqwen:7b OLLAMA_TEMPERATURE=0.7 # Test python -c " from services.llm_service import LLMService service = LLMService() print('✅ CodeQwen 1.5:7b ready!') response = service.simple_chat_completion('Parse this recipe: 2 cups flour, 1 egg, 1 cup milk') print(f'Response: {response[:100]}...') " ``` #### Mistral-Nemo:12b (Balanced Performance) ```bash # Pull model ollama pull mistral-nemo:12b # Update .env LLM_PROVIDER=ollama OLLAMA_MODEL=mistral-nemo:12b OLLAMA_TEMPERATURE=0.7 # Test python -c " from services.llm_service import LLMService service = LLMService() print('✅ Mistral-Nemo ready!') response = service.simple_chat_completion('Suggest a Mediterranean dinner menu') print(f'Response: {response[:100]}...') " ``` #### Claude 3.5 Haiku (Speed + Quality) ```bash # Update .env LLM_PROVIDER=anthropic ANTHROPIC_MODEL=claude-3-5-haiku-20241022 ANTHROPIC_TEMPERATURE=0.7 ANTHROPIC_MAX_TOKENS=1000 # Test python -c " from services.llm_service import LLMService service = LLMService() print('✅ Claude 3.5 Haiku ready!') response = service.simple_chat_completion('Quick dinner ideas with vegetables') print(f'Response: {response[:100]}...') " ``` #### GPT-5-nano (Budget Winner) ```bash # Update .env LLM_PROVIDER=openai OPENAI_MODEL=gpt-5-nano OPENAI_TEMPERATURE=0.7 OPENAI_MAX_TOKENS=1000 # Test python -c " from services.llm_service import LLMService service = LLMService() print('✅ GPT-5-nano ready!') response = service.simple_chat_completion('Quick healthy breakfast ideas') print(f'Response: {response[:100]}...') " ``` #### GPT-5 (Premium) ```bash # Update .env LLM_PROVIDER=openai OPENAI_MODEL=gpt-5 OPENAI_TEMPERATURE=0.7 OPENAI_MAX_TOKENS=1000 # Test python -c " from services.llm_service import LLMService service = LLMService() print('✅ GPT-5 ready!') response = service.simple_chat_completion('Create a healthy meal plan') print(f'Response: {response[:100]}...') " ``` ### Self-Hosted Models #### DeepSeek-R1:7b (Latest Breakthrough) ```bash # Pull model ollama pull deepseek-r1:7b # Update .env LLM_PROVIDER=ollama OLLAMA_MODEL=deepseek-r1:7b OLLAMA_TEMPERATURE=0.7 # Start Ollama ollama serve & # Test python -c " from services.llm_service import LLMService service = LLMService() print('✅ DeepSeek-R1 ready!') response = service.simple_chat_completion('Explain the science behind sourdough fermentation') print(f'Response: {response[:100]}...') " ``` #### Gemma 3:4b (Efficient) ```bash # Pull model ollama pull gemma3:4b # Update .env LLM_PROVIDER=ollama OLLAMA_MODEL=gemma3:4b OLLAMA_TEMPERATURE=0.7 # Test python -c " from services.llm_service import LLMService service = LLMService() print('✅ Gemma 3:4b ready!') response = service.simple_chat_completion('Quick chicken recipes for weeknight dinners') print(f'Response: {response[:100]}...') " ``` --- ## 🔧 Hardware Requirements ### Cloud Models - **Requirements**: Internet connection, API key - **RAM**: Any (processing done remotely) - **Storage**: Minimal - **Best For**: Instant setup, no hardware constraints ### Self-Hosted Requirements | Model | Parameters | RAM Needed | Storage | GPU Beneficial | Best For | |-------|------------|------------|---------|----------------|----------| | `gemma3:4b` | 4B | 6GB | 3.3GB | Optional | Laptops, modest hardware | | `codeqwen:7b` | 7B | 8GB | 4.2GB | Yes | Structured data, parsing | | `llama3.1:8b` | 8B | 8GB | 4.7GB | Yes | Standard workstations | | `deepseek-r1:7b` | 7B | 8GB | 4.7GB | Yes | Reasoning tasks | | `openhermes2.5-mistral:7b` | 7B | 8GB | 4.1GB | Yes | Conversational AI | | `nous-hermes2:10.7b` | 10.7B | 12GB | 6.4GB | Recommended | Instruction following | | `mistral-nemo:12b` | 12B | 12GB | 7GB | Recommended | Balanced performance | | `phi4:14b` | 14B | 16GB | 9.1GB | Recommended | High-end workstations | | `gemma3:27b` | 27B | 32GB | 17GB | Required | Powerful servers | ---