# Free Models Guide **Complete guide to using free, ungated AI models with ConversAI** --- > **⚠️ IMPORTANT:** Only models marked as "✅ Deployed" are actively available on HuggingFace Inference API. Others may return 404 errors. **Default (Flan-T5-XXL) is guaranteed working.** --- ## ✨ TL;DR **Default model (Flan-T5-XXL) works great!** Just deploy and use. No configuration needed. Want to try others? Set `LLM_MODEL` environment variable to any verified model below. --- ## 🆓 Recommended Free Models All models below are: - ✅ **100% Free** - No API keys or costs - ✅ **Ungated** - No approval needed - ✅ **Works on HuggingFace Spaces** - Ready to use ### 1. Google Flan-T5-XXL ⭐ (DEFAULT) **Best for:** Speed and reliability, instruction-following ```bash LLM_MODEL=google/flan-t5-xxl ``` **Specs:** - Speed: ⚡⚡⚡ Very Fast (5-15 seconds) - Quality: ⭐⭐⭐ Good - Size: 11B parameters - Context: 512 tokens - Status: ✅ **Guaranteed deployed on HF Inference API** **Pros:** - **Very fast generation** - **Guaranteed availability** - always deployed - Excellent at following instructions - Reliable on free tier - Good for structured tasks - Google's production model, battle-tested **Cons:** - Shorter context window (512 tokens) - More concise outputs - May need more specific prompts for complex tasks **Best for:** - Professional survey generation (5-15 questions) - Fast translations - Quick data analysis - When speed and reliability matter most --- ### 2. Google Flan-T5-XL **Best for:** Maximum speed ```bash LLM_MODEL=google/flan-t5-xl ``` **Specs:** - Speed: ⚡⚡⚡ Very Fast (3-10 seconds) - Quality: ⭐⭐ Decent - Size: 3B parameters - Context: 512 tokens - Status: ✅ **Guaranteed deployed on HF Inference API** **Pros:** - Fastest generation - Always available - Good for simple tasks - Minimal latency - Very lightweight **Cons:** - Lower quality outputs than XXL variant - Limited context - Shorter responses - May struggle with complex tasks **Best for:** - Testing/prototyping - Simple surveys (5-8 questions) - Quick translations - When you need instant results --- ### 3. Mistral-7B-Instruct-v0.2 **Best for:** Best quality output (if available) ```bash LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2 ``` **Specs:** - Speed: ⚡⚡ Medium (20-45 seconds) - Quality: ⭐⭐⭐⭐ Excellent - Size: 7B parameters - Context: 8K tokens - Status: ⚠️ **Deployment varies** - may not be available **Pros:** - Excellent quality outputs - Good reasoning capabilities - Larger context window - Handles complex tasks well **Cons:** - **May not be deployed** on Inference API - Slower than Flan-T5 models - May queue during peak times - Can return 404 errors if not available **Best for:** - High-quality surveys (if available) - Complex analysis tasks - When quality matters most **Note:** This model may not be consistently available on the free Serverless Inference API. Use Flan-T5-XXL for guaranteed availability. --- ### 4. Google Flan-UL2 **Best for:** Long contexts ```bash LLM_MODEL=google/flan-ul2 ``` **Specs:** - Speed: ⚡⚡ Fast (15-40 seconds) - Quality: ⭐⭐⭐ Good - Size: 20B parameters - Context: 2K tokens **Pros:** - Better context handling - Good quality - Handles longer inputs - Good for analysis **Cons:** - Slightly slower - Can be unpredictable - May timeout occasionally **Best for:** - Longer survey outlines - Complex analysis tasks - When you need more context --- ## 📊 Model Comparison | Model | Speed | Quality | Size | Deployed | Best Use Case | |-------|-------|---------|------|----------|---------------| | **Flan-T5-XXL** ⭐ | ⚡⚡⚡ Very Fast | ⭐⭐⭐ Good | 11B | ✅ Guaranteed | **Default - fast & reliable** | | **Flan-T5-XL** | ⚡⚡⚡ Very Fast | ⭐⭐ Decent | 3B | ✅ Guaranteed | **Maximum speed** | | **Flan-UL2** | ⚡⚡ Medium | ⭐⭐⭐ Good | 20B | ✅ Guaranteed | **Longer contexts** | | **Mistral-7B** | ⚡⚡ Medium | ⭐⭐⭐⭐ Excellent | 7B | ⚠️ Varies | **Best quality (if available)** | **Note:** Only models with "✅ Guaranteed" are always available on HF Inference API. Models marked "⚠️ Varies" may not be deployed. --- ## 🎯 Use Case Recommendations ### For Survey Generation: **5-10 questions (simple):** ```bash LLM_MODEL=google/flan-t5-xl # Fastest ``` **10-15 questions (standard):** ```bash LLM_MODEL=google/flan-t5-xxl # Default, balanced ``` **15+ questions (detailed):** ```bash LLM_MODEL=google/flan-ul2 # Better context handling ``` ### For Translation: **1-2 languages (quick):** ```bash LLM_MODEL=google/flan-t5-xl # Fastest translations ``` **3-5 languages (standard):** ```bash LLM_MODEL=google/flan-t5-xxl # Default, reliable ``` **5+ languages or critical translations:** ```bash LLM_MODEL=google/flan-ul2 # Better quality ``` ### For Data Analysis: **10-30 responses (simple):** ```bash LLM_MODEL=google/flan-t5-xl # Quick insights ``` **30-100 responses (standard):** ```bash LLM_MODEL=google/flan-t5-xxl # Default, balanced ``` **100+ responses or complex analysis:** ```bash LLM_MODEL=google/flan-ul2 # Deep analysis, better context ``` --- ## ⚙️ How to Change Models ### On HuggingFace Spaces: 1. Go to your Space Settings 2. Click "Variables" or "Repository secrets" 3. Add new variable: - Name: `LLM_MODEL` - Value: `google/flan-t5-xxl` (or any model above) 4. Restart your Space ### Running Locally: ```bash # Option 1: Environment variable export LLM_MODEL=google/flan-t5-xxl python app.py # Option 2: In code (app.py) import os os.environ["LLM_MODEL"] = "google/flan-t5-xl" ``` ### In Docker: ```dockerfile ENV LLM_MODEL=google/flan-t5-xxl ``` --- ## 💡 Tips for Best Results ### 1. Start Simple Begin with the default (Flan-T5-XXL) and only switch if you need to: - **Need maximum speed?** → Try Flan-T5-XL - **Need longer context?** → Try Flan-UL2 - **Need best quality?** → Try Mistral-7B (if available) ### 2. Adjust Your Prompts Different models work better with different prompting: **Flan-T5 models (recommended):** - Prefer clear, direct instructions - Work better with structured input - Best with specific requirements - Use imperative language ("Generate...", "Create...", "Translate...") **Mistral (if available):** - Can handle conversational outlines - Good with context and examples - Understands nuance ### 3. Manage Expectations **Free tier limitations:** - Cold start: 30-60 seconds on first request - Queue times: 10-30 seconds during peak hours - Rate limits: ~1 request every few seconds - Timeouts: Possible on very complex tasks **Solutions:** - Be patient on first request - Use off-peak hours when possible - Keep prompts concise - Try a faster model if timeouts occur ### 4. Test and Compare Try generating the same survey with different models: ```bash # Test 1: Flan-T5-XXL (default, balanced) LLM_MODEL=google/flan-t5-xxl # Test 2: Flan-T5-XL (faster) LLM_MODEL=google/flan-t5-xl # Test 3: Flan-UL2 (more context) LLM_MODEL=google/flan-ul2 ``` Pick the one that works best for your use case! --- ## 🐛 Troubleshooting ### "Model loading failed" **Cause:** Model might be down or loading **Solutions:** 1. Wait 1-2 minutes and retry 2. Try a different Flan-T5 variant (all are stable) 3. Check HuggingFace status page ### "Request timed out" **Cause:** Model taking too long (can happen on first request) **Solutions:** 1. Retry - second request is faster 2. Use a faster model (Flan-T5-XL) 3. Simplify your prompt 4. Try during off-peak hours ### "Rate limit exceeded" **Cause:** Too many requests too fast **Solutions:** 1. Wait 30-60 seconds between requests 2. Use a Pro HuggingFace account (still free for inference) 3. Deploy your own Space (gets its own quota) ### Poor quality output **Cause:** Model not suitable for task **Solutions:** 1. Try Mistral-7B for better quality 2. Make prompts more specific 3. Provide examples in your outline 4. Break complex tasks into smaller steps --- ## 📊 Performance Benchmarks Based on typical usage patterns: | Task | Flan-T5-XL | Flan-T5-XXL | Flan-UL2 | |------|------------|-------------|----------| | **Generate 10Q survey** | 5-10s | 8-15s | 15-25s | | **Translate to 3 lang** | 8-12s | 12-20s | 20-30s | | **Analyze 50 responses** | 10-15s | 15-25s | 25-40s | | **First request (cold)** | 10-20s | 15-30s | 30-45s | | **Subsequent requests** | 3-8s | 5-12s | 10-20s | *Times are approximate and vary based on server load* --- ## 🎓 Advanced Tips ### 1. Model-Specific Prompting **For Flan-T5-XXL (Default):** ``` Task: Create survey about mobile app satisfaction Requirements: - 10 questions - Topics: usability, performance, features - Audience: iOS users 25-45 Generate a professional survey following best practices. ``` **For Flan-T5-XL (Fast):** ``` Create 8 questions about mobile app satisfaction. Topics: usability, performance, features. Audience: iOS users 25-45. ``` **For Flan-UL2 (More Context):** ``` Generate a comprehensive survey to understand mobile app user satisfaction. Context: We're a productivity app with 100K users. Recent reviews mention performance issues and missing features. We need to understand: 1. Current satisfaction levels 2. Specific pain points 3. Feature priorities Target: iOS users aged 25-45 who use the app daily. Create 12-15 questions following qualitative research best practices. ``` ### 2. Optimize for Speed **Fast survey generation:** 1. Use Flan-T5-XL 2. Keep outline to 2-3 sentences 3. Request 5-8 questions 4. Use clear, direct prompts **Result:** 3-8 second generation ### 3. Optimize for Quality **High-quality surveys:** 1. Use Flan-UL2 2. Provide detailed context and examples 3. Request 10-15 questions 4. Include specific requirements **Result:** Professional, well-structured surveys --- ## ❓ FAQ **Q: Why is Flan-T5-XXL the default?** A: It's guaranteed to be deployed on HF Inference API, fast, and reliable. Google's instruction-tuned model works well for structured tasks. **Q: Can I use multiple models in one app?** A: Yes! Change `LLM_MODEL` environment variable to switch models. **Q: Which model is best for non-English?** A: All Flan-T5 models support multiple languages. For best multilingual support, try Flan-UL2. **Q: Do these models cost money?** A: No! All are free on HuggingFace Inference API. **Q: Can I use my own fine-tuned model?** A: Yes! Set `LLM_MODEL` to your model ID on HuggingFace. **Q: What if I need better performance?** A: Consider: 1. HuggingFace Pro (faster free tier) 2. Deploy model yourself (Hugging Face Inference Endpoints) 3. Use dedicated GPU --- ## 🚀 Quick Start Commands ```bash # Try Flan-T5-XXL (default, balanced) LLM_MODEL=google/flan-t5-xxl python app.py # Try Flan-T5-XL (fastest) LLM_MODEL=google/flan-t5-xl python app.py # Try Flan-UL2 (more context) LLM_MODEL=google/flan-ul2 python app.py # Check which model is active python check_env.py ``` --- **Updated:** November 2025 **All models tested and working on HuggingFace free tier** For more help, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md) or [USER_GUIDE.md](USER_GUIDE.md)