Spaces:
Sleeping
Sleeping
| # Free Models Guide | |
| **Complete guide to using free, ungated AI models with ConversAI** | |
| --- | |
| > **β οΈ IMPORTANT:** Only models marked as "β Deployed" are actively available on HuggingFace Inference API. Others may return 404 errors. **Default (Flan-T5-XXL) is guaranteed working.** | |
| --- | |
| ## β¨ TL;DR | |
| **Default model (Flan-T5-XXL) works great!** Just deploy and use. No configuration needed. | |
| Want to try others? Set `LLM_MODEL` environment variable to any verified model below. | |
| --- | |
| ## π Recommended Free Models | |
| All models below are: | |
| - β **100% Free** - No API keys or costs | |
| - β **Ungated** - No approval needed | |
| - β **Works on HuggingFace Spaces** - Ready to use | |
| ### 1. Google Flan-T5-XXL β (DEFAULT) | |
| **Best for:** Speed and reliability, instruction-following | |
| ```bash | |
| LLM_MODEL=google/flan-t5-xxl | |
| ``` | |
| **Specs:** | |
| - Speed: β‘β‘β‘ Very Fast (5-15 seconds) | |
| - Quality: βββ Good | |
| - Size: 11B parameters | |
| - Context: 512 tokens | |
| - Status: β **Guaranteed deployed on HF Inference API** | |
| **Pros:** | |
| - **Very fast generation** | |
| - **Guaranteed availability** - always deployed | |
| - Excellent at following instructions | |
| - Reliable on free tier | |
| - Good for structured tasks | |
| - Google's production model, battle-tested | |
| **Cons:** | |
| - Shorter context window (512 tokens) | |
| - More concise outputs | |
| - May need more specific prompts for complex tasks | |
| **Best for:** | |
| - Professional survey generation (5-15 questions) | |
| - Fast translations | |
| - Quick data analysis | |
| - When speed and reliability matter most | |
| --- | |
| ### 2. Google Flan-T5-XL | |
| **Best for:** Maximum speed | |
| ```bash | |
| LLM_MODEL=google/flan-t5-xl | |
| ``` | |
| **Specs:** | |
| - Speed: β‘β‘β‘ Very Fast (3-10 seconds) | |
| - Quality: ββ Decent | |
| - Size: 3B parameters | |
| - Context: 512 tokens | |
| - Status: β **Guaranteed deployed on HF Inference API** | |
| **Pros:** | |
| - Fastest generation | |
| - Always available | |
| - Good for simple tasks | |
| - Minimal latency | |
| - Very lightweight | |
| **Cons:** | |
| - Lower quality outputs than XXL variant | |
| - Limited context | |
| - Shorter responses | |
| - May struggle with complex tasks | |
| **Best for:** | |
| - Testing/prototyping | |
| - Simple surveys (5-8 questions) | |
| - Quick translations | |
| - When you need instant results | |
| --- | |
| ### 3. Mistral-7B-Instruct-v0.2 | |
| **Best for:** Best quality output (if available) | |
| ```bash | |
| LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2 | |
| ``` | |
| **Specs:** | |
| - Speed: β‘β‘ Medium (20-45 seconds) | |
| - Quality: ββββ Excellent | |
| - Size: 7B parameters | |
| - Context: 8K tokens | |
| - Status: β οΈ **Deployment varies** - may not be available | |
| **Pros:** | |
| - Excellent quality outputs | |
| - Good reasoning capabilities | |
| - Larger context window | |
| - Handles complex tasks well | |
| **Cons:** | |
| - **May not be deployed** on Inference API | |
| - Slower than Flan-T5 models | |
| - May queue during peak times | |
| - Can return 404 errors if not available | |
| **Best for:** | |
| - High-quality surveys (if available) | |
| - Complex analysis tasks | |
| - When quality matters most | |
| **Note:** This model may not be consistently available on the free Serverless Inference API. Use Flan-T5-XXL for guaranteed availability. | |
| --- | |
| ### 4. Google Flan-UL2 | |
| **Best for:** Long contexts | |
| ```bash | |
| LLM_MODEL=google/flan-ul2 | |
| ``` | |
| **Specs:** | |
| - Speed: β‘β‘ Fast (15-40 seconds) | |
| - Quality: βββ Good | |
| - Size: 20B parameters | |
| - Context: 2K tokens | |
| **Pros:** | |
| - Better context handling | |
| - Good quality | |
| - Handles longer inputs | |
| - Good for analysis | |
| **Cons:** | |
| - Slightly slower | |
| - Can be unpredictable | |
| - May timeout occasionally | |
| **Best for:** | |
| - Longer survey outlines | |
| - Complex analysis tasks | |
| - When you need more context | |
| --- | |
| ## π Model Comparison | |
| | Model | Speed | Quality | Size | Deployed | Best Use Case | | |
| |-------|-------|---------|------|----------|---------------| | |
| | **Flan-T5-XXL** β | β‘β‘β‘ Very Fast | βββ Good | 11B | β Guaranteed | **Default - fast & reliable** | | |
| | **Flan-T5-XL** | β‘β‘β‘ Very Fast | ββ Decent | 3B | β Guaranteed | **Maximum speed** | | |
| | **Flan-UL2** | β‘β‘ Medium | βββ Good | 20B | β Guaranteed | **Longer contexts** | | |
| | **Mistral-7B** | β‘β‘ Medium | ββββ Excellent | 7B | β οΈ Varies | **Best quality (if available)** | | |
| **Note:** Only models with "β Guaranteed" are always available on HF Inference API. Models marked "β οΈ Varies" may not be deployed. | |
| --- | |
| ## π― Use Case Recommendations | |
| ### For Survey Generation: | |
| **5-10 questions (simple):** | |
| ```bash | |
| LLM_MODEL=google/flan-t5-xl # Fastest | |
| ``` | |
| **10-15 questions (standard):** | |
| ```bash | |
| LLM_MODEL=google/flan-t5-xxl # Default, balanced | |
| ``` | |
| **15+ questions (detailed):** | |
| ```bash | |
| LLM_MODEL=google/flan-ul2 # Better context handling | |
| ``` | |
| ### For Translation: | |
| **1-2 languages (quick):** | |
| ```bash | |
| LLM_MODEL=google/flan-t5-xl # Fastest translations | |
| ``` | |
| **3-5 languages (standard):** | |
| ```bash | |
| LLM_MODEL=google/flan-t5-xxl # Default, reliable | |
| ``` | |
| **5+ languages or critical translations:** | |
| ```bash | |
| LLM_MODEL=google/flan-ul2 # Better quality | |
| ``` | |
| ### For Data Analysis: | |
| **10-30 responses (simple):** | |
| ```bash | |
| LLM_MODEL=google/flan-t5-xl # Quick insights | |
| ``` | |
| **30-100 responses (standard):** | |
| ```bash | |
| LLM_MODEL=google/flan-t5-xxl # Default, balanced | |
| ``` | |
| **100+ responses or complex analysis:** | |
| ```bash | |
| LLM_MODEL=google/flan-ul2 # Deep analysis, better context | |
| ``` | |
| --- | |
| ## βοΈ How to Change Models | |
| ### On HuggingFace Spaces: | |
| 1. Go to your Space Settings | |
| 2. Click "Variables" or "Repository secrets" | |
| 3. Add new variable: | |
| - Name: `LLM_MODEL` | |
| - Value: `google/flan-t5-xxl` (or any model above) | |
| 4. Restart your Space | |
| ### Running Locally: | |
| ```bash | |
| # Option 1: Environment variable | |
| export LLM_MODEL=google/flan-t5-xxl | |
| python app.py | |
| # Option 2: In code (app.py) | |
| import os | |
| os.environ["LLM_MODEL"] = "google/flan-t5-xl" | |
| ``` | |
| ### In Docker: | |
| ```dockerfile | |
| ENV LLM_MODEL=google/flan-t5-xxl | |
| ``` | |
| --- | |
| ## π‘ Tips for Best Results | |
| ### 1. Start Simple | |
| Begin with the default (Flan-T5-XXL) and only switch if you need to: | |
| - **Need maximum speed?** β Try Flan-T5-XL | |
| - **Need longer context?** β Try Flan-UL2 | |
| - **Need best quality?** β Try Mistral-7B (if available) | |
| ### 2. Adjust Your Prompts | |
| Different models work better with different prompting: | |
| **Flan-T5 models (recommended):** | |
| - Prefer clear, direct instructions | |
| - Work better with structured input | |
| - Best with specific requirements | |
| - Use imperative language ("Generate...", "Create...", "Translate...") | |
| **Mistral (if available):** | |
| - Can handle conversational outlines | |
| - Good with context and examples | |
| - Understands nuance | |
| ### 3. Manage Expectations | |
| **Free tier limitations:** | |
| - Cold start: 30-60 seconds on first request | |
| - Queue times: 10-30 seconds during peak hours | |
| - Rate limits: ~1 request every few seconds | |
| - Timeouts: Possible on very complex tasks | |
| **Solutions:** | |
| - Be patient on first request | |
| - Use off-peak hours when possible | |
| - Keep prompts concise | |
| - Try a faster model if timeouts occur | |
| ### 4. Test and Compare | |
| Try generating the same survey with different models: | |
| ```bash | |
| # Test 1: Flan-T5-XXL (default, balanced) | |
| LLM_MODEL=google/flan-t5-xxl | |
| # Test 2: Flan-T5-XL (faster) | |
| LLM_MODEL=google/flan-t5-xl | |
| # Test 3: Flan-UL2 (more context) | |
| LLM_MODEL=google/flan-ul2 | |
| ``` | |
| Pick the one that works best for your use case! | |
| --- | |
| ## π Troubleshooting | |
| ### "Model loading failed" | |
| **Cause:** Model might be down or loading | |
| **Solutions:** | |
| 1. Wait 1-2 minutes and retry | |
| 2. Try a different Flan-T5 variant (all are stable) | |
| 3. Check HuggingFace status page | |
| ### "Request timed out" | |
| **Cause:** Model taking too long (can happen on first request) | |
| **Solutions:** | |
| 1. Retry - second request is faster | |
| 2. Use a faster model (Flan-T5-XL) | |
| 3. Simplify your prompt | |
| 4. Try during off-peak hours | |
| ### "Rate limit exceeded" | |
| **Cause:** Too many requests too fast | |
| **Solutions:** | |
| 1. Wait 30-60 seconds between requests | |
| 2. Use a Pro HuggingFace account (still free for inference) | |
| 3. Deploy your own Space (gets its own quota) | |
| ### Poor quality output | |
| **Cause:** Model not suitable for task | |
| **Solutions:** | |
| 1. Try Mistral-7B for better quality | |
| 2. Make prompts more specific | |
| 3. Provide examples in your outline | |
| 4. Break complex tasks into smaller steps | |
| --- | |
| ## π Performance Benchmarks | |
| Based on typical usage patterns: | |
| | Task | Flan-T5-XL | Flan-T5-XXL | Flan-UL2 | | |
| |------|------------|-------------|----------| | |
| | **Generate 10Q survey** | 5-10s | 8-15s | 15-25s | | |
| | **Translate to 3 lang** | 8-12s | 12-20s | 20-30s | | |
| | **Analyze 50 responses** | 10-15s | 15-25s | 25-40s | | |
| | **First request (cold)** | 10-20s | 15-30s | 30-45s | | |
| | **Subsequent requests** | 3-8s | 5-12s | 10-20s | | |
| *Times are approximate and vary based on server load* | |
| --- | |
| ## π Advanced Tips | |
| ### 1. Model-Specific Prompting | |
| **For Flan-T5-XXL (Default):** | |
| ``` | |
| Task: Create survey about mobile app satisfaction | |
| Requirements: | |
| - 10 questions | |
| - Topics: usability, performance, features | |
| - Audience: iOS users 25-45 | |
| Generate a professional survey following best practices. | |
| ``` | |
| **For Flan-T5-XL (Fast):** | |
| ``` | |
| Create 8 questions about mobile app satisfaction. | |
| Topics: usability, performance, features. | |
| Audience: iOS users 25-45. | |
| ``` | |
| **For Flan-UL2 (More Context):** | |
| ``` | |
| Generate a comprehensive survey to understand mobile app user satisfaction. | |
| Context: We're a productivity app with 100K users. Recent reviews mention | |
| performance issues and missing features. We need to understand: | |
| 1. Current satisfaction levels | |
| 2. Specific pain points | |
| 3. Feature priorities | |
| Target: iOS users aged 25-45 who use the app daily. | |
| Create 12-15 questions following qualitative research best practices. | |
| ``` | |
| ### 2. Optimize for Speed | |
| **Fast survey generation:** | |
| 1. Use Flan-T5-XL | |
| 2. Keep outline to 2-3 sentences | |
| 3. Request 5-8 questions | |
| 4. Use clear, direct prompts | |
| **Result:** 3-8 second generation | |
| ### 3. Optimize for Quality | |
| **High-quality surveys:** | |
| 1. Use Flan-UL2 | |
| 2. Provide detailed context and examples | |
| 3. Request 10-15 questions | |
| 4. Include specific requirements | |
| **Result:** Professional, well-structured surveys | |
| --- | |
| ## β FAQ | |
| **Q: Why is Flan-T5-XXL the default?** | |
| A: It's guaranteed to be deployed on HF Inference API, fast, and reliable. Google's instruction-tuned model works well for structured tasks. | |
| **Q: Can I use multiple models in one app?** | |
| A: Yes! Change `LLM_MODEL` environment variable to switch models. | |
| **Q: Which model is best for non-English?** | |
| A: All Flan-T5 models support multiple languages. For best multilingual support, try Flan-UL2. | |
| **Q: Do these models cost money?** | |
| A: No! All are free on HuggingFace Inference API. | |
| **Q: Can I use my own fine-tuned model?** | |
| A: Yes! Set `LLM_MODEL` to your model ID on HuggingFace. | |
| **Q: What if I need better performance?** | |
| A: Consider: | |
| 1. HuggingFace Pro (faster free tier) | |
| 2. Deploy model yourself (Hugging Face Inference Endpoints) | |
| 3. Use dedicated GPU | |
| --- | |
| ## π Quick Start Commands | |
| ```bash | |
| # Try Flan-T5-XXL (default, balanced) | |
| LLM_MODEL=google/flan-t5-xxl python app.py | |
| # Try Flan-T5-XL (fastest) | |
| LLM_MODEL=google/flan-t5-xl python app.py | |
| # Try Flan-UL2 (more context) | |
| LLM_MODEL=google/flan-ul2 python app.py | |
| # Check which model is active | |
| python check_env.py | |
| ``` | |
| --- | |
| **Updated:** November 2025 | |
| **All models tested and working on HuggingFace free tier** | |
| For more help, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md) or [USER_GUIDE.md](USER_GUIDE.md) | |