ProjectEcho / FREE_MODELS.md
jmisak's picture
Upload 4 files
1a19352 verified
# Free Models Guide
**Complete guide to using free, ungated AI models with ConversAI**
---
> **⚠️ IMPORTANT:** Only models marked as "βœ… Deployed" are actively available on HuggingFace Inference API. Others may return 404 errors. **Default (Flan-T5-XXL) is guaranteed working.**
---
## ✨ TL;DR
**Default model (Flan-T5-XXL) works great!** Just deploy and use. No configuration needed.
Want to try others? Set `LLM_MODEL` environment variable to any verified model below.
---
## πŸ†“ Recommended Free Models
All models below are:
- βœ… **100% Free** - No API keys or costs
- βœ… **Ungated** - No approval needed
- βœ… **Works on HuggingFace Spaces** - Ready to use
### 1. Google Flan-T5-XXL ⭐ (DEFAULT)
**Best for:** Speed and reliability, instruction-following
```bash
LLM_MODEL=google/flan-t5-xxl
```
**Specs:**
- Speed: ⚑⚑⚑ Very Fast (5-15 seconds)
- Quality: ⭐⭐⭐ Good
- Size: 11B parameters
- Context: 512 tokens
- Status: βœ… **Guaranteed deployed on HF Inference API**
**Pros:**
- **Very fast generation**
- **Guaranteed availability** - always deployed
- Excellent at following instructions
- Reliable on free tier
- Good for structured tasks
- Google's production model, battle-tested
**Cons:**
- Shorter context window (512 tokens)
- More concise outputs
- May need more specific prompts for complex tasks
**Best for:**
- Professional survey generation (5-15 questions)
- Fast translations
- Quick data analysis
- When speed and reliability matter most
---
### 2. Google Flan-T5-XL
**Best for:** Maximum speed
```bash
LLM_MODEL=google/flan-t5-xl
```
**Specs:**
- Speed: ⚑⚑⚑ Very Fast (3-10 seconds)
- Quality: ⭐⭐ Decent
- Size: 3B parameters
- Context: 512 tokens
- Status: βœ… **Guaranteed deployed on HF Inference API**
**Pros:**
- Fastest generation
- Always available
- Good for simple tasks
- Minimal latency
- Very lightweight
**Cons:**
- Lower quality outputs than XXL variant
- Limited context
- Shorter responses
- May struggle with complex tasks
**Best for:**
- Testing/prototyping
- Simple surveys (5-8 questions)
- Quick translations
- When you need instant results
---
### 3. Mistral-7B-Instruct-v0.2
**Best for:** Best quality output (if available)
```bash
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
```
**Specs:**
- Speed: ⚑⚑ Medium (20-45 seconds)
- Quality: ⭐⭐⭐⭐ Excellent
- Size: 7B parameters
- Context: 8K tokens
- Status: ⚠️ **Deployment varies** - may not be available
**Pros:**
- Excellent quality outputs
- Good reasoning capabilities
- Larger context window
- Handles complex tasks well
**Cons:**
- **May not be deployed** on Inference API
- Slower than Flan-T5 models
- May queue during peak times
- Can return 404 errors if not available
**Best for:**
- High-quality surveys (if available)
- Complex analysis tasks
- When quality matters most
**Note:** This model may not be consistently available on the free Serverless Inference API. Use Flan-T5-XXL for guaranteed availability.
---
### 4. Google Flan-UL2
**Best for:** Long contexts
```bash
LLM_MODEL=google/flan-ul2
```
**Specs:**
- Speed: ⚑⚑ Fast (15-40 seconds)
- Quality: ⭐⭐⭐ Good
- Size: 20B parameters
- Context: 2K tokens
**Pros:**
- Better context handling
- Good quality
- Handles longer inputs
- Good for analysis
**Cons:**
- Slightly slower
- Can be unpredictable
- May timeout occasionally
**Best for:**
- Longer survey outlines
- Complex analysis tasks
- When you need more context
---
## πŸ“Š Model Comparison
| Model | Speed | Quality | Size | Deployed | Best Use Case |
|-------|-------|---------|------|----------|---------------|
| **Flan-T5-XXL** ⭐ | ⚑⚑⚑ Very Fast | ⭐⭐⭐ Good | 11B | βœ… Guaranteed | **Default - fast & reliable** |
| **Flan-T5-XL** | ⚑⚑⚑ Very Fast | ⭐⭐ Decent | 3B | βœ… Guaranteed | **Maximum speed** |
| **Flan-UL2** | ⚑⚑ Medium | ⭐⭐⭐ Good | 20B | βœ… Guaranteed | **Longer contexts** |
| **Mistral-7B** | ⚑⚑ Medium | ⭐⭐⭐⭐ Excellent | 7B | ⚠️ Varies | **Best quality (if available)** |
**Note:** Only models with "βœ… Guaranteed" are always available on HF Inference API. Models marked "⚠️ Varies" may not be deployed.
---
## 🎯 Use Case Recommendations
### For Survey Generation:
**5-10 questions (simple):**
```bash
LLM_MODEL=google/flan-t5-xl # Fastest
```
**10-15 questions (standard):**
```bash
LLM_MODEL=google/flan-t5-xxl # Default, balanced
```
**15+ questions (detailed):**
```bash
LLM_MODEL=google/flan-ul2 # Better context handling
```
### For Translation:
**1-2 languages (quick):**
```bash
LLM_MODEL=google/flan-t5-xl # Fastest translations
```
**3-5 languages (standard):**
```bash
LLM_MODEL=google/flan-t5-xxl # Default, reliable
```
**5+ languages or critical translations:**
```bash
LLM_MODEL=google/flan-ul2 # Better quality
```
### For Data Analysis:
**10-30 responses (simple):**
```bash
LLM_MODEL=google/flan-t5-xl # Quick insights
```
**30-100 responses (standard):**
```bash
LLM_MODEL=google/flan-t5-xxl # Default, balanced
```
**100+ responses or complex analysis:**
```bash
LLM_MODEL=google/flan-ul2 # Deep analysis, better context
```
---
## βš™οΈ How to Change Models
### On HuggingFace Spaces:
1. Go to your Space Settings
2. Click "Variables" or "Repository secrets"
3. Add new variable:
- Name: `LLM_MODEL`
- Value: `google/flan-t5-xxl` (or any model above)
4. Restart your Space
### Running Locally:
```bash
# Option 1: Environment variable
export LLM_MODEL=google/flan-t5-xxl
python app.py
# Option 2: In code (app.py)
import os
os.environ["LLM_MODEL"] = "google/flan-t5-xl"
```
### In Docker:
```dockerfile
ENV LLM_MODEL=google/flan-t5-xxl
```
---
## πŸ’‘ Tips for Best Results
### 1. Start Simple
Begin with the default (Flan-T5-XXL) and only switch if you need to:
- **Need maximum speed?** β†’ Try Flan-T5-XL
- **Need longer context?** β†’ Try Flan-UL2
- **Need best quality?** β†’ Try Mistral-7B (if available)
### 2. Adjust Your Prompts
Different models work better with different prompting:
**Flan-T5 models (recommended):**
- Prefer clear, direct instructions
- Work better with structured input
- Best with specific requirements
- Use imperative language ("Generate...", "Create...", "Translate...")
**Mistral (if available):**
- Can handle conversational outlines
- Good with context and examples
- Understands nuance
### 3. Manage Expectations
**Free tier limitations:**
- Cold start: 30-60 seconds on first request
- Queue times: 10-30 seconds during peak hours
- Rate limits: ~1 request every few seconds
- Timeouts: Possible on very complex tasks
**Solutions:**
- Be patient on first request
- Use off-peak hours when possible
- Keep prompts concise
- Try a faster model if timeouts occur
### 4. Test and Compare
Try generating the same survey with different models:
```bash
# Test 1: Flan-T5-XXL (default, balanced)
LLM_MODEL=google/flan-t5-xxl
# Test 2: Flan-T5-XL (faster)
LLM_MODEL=google/flan-t5-xl
# Test 3: Flan-UL2 (more context)
LLM_MODEL=google/flan-ul2
```
Pick the one that works best for your use case!
---
## πŸ› Troubleshooting
### "Model loading failed"
**Cause:** Model might be down or loading
**Solutions:**
1. Wait 1-2 minutes and retry
2. Try a different Flan-T5 variant (all are stable)
3. Check HuggingFace status page
### "Request timed out"
**Cause:** Model taking too long (can happen on first request)
**Solutions:**
1. Retry - second request is faster
2. Use a faster model (Flan-T5-XL)
3. Simplify your prompt
4. Try during off-peak hours
### "Rate limit exceeded"
**Cause:** Too many requests too fast
**Solutions:**
1. Wait 30-60 seconds between requests
2. Use a Pro HuggingFace account (still free for inference)
3. Deploy your own Space (gets its own quota)
### Poor quality output
**Cause:** Model not suitable for task
**Solutions:**
1. Try Mistral-7B for better quality
2. Make prompts more specific
3. Provide examples in your outline
4. Break complex tasks into smaller steps
---
## πŸ“Š Performance Benchmarks
Based on typical usage patterns:
| Task | Flan-T5-XL | Flan-T5-XXL | Flan-UL2 |
|------|------------|-------------|----------|
| **Generate 10Q survey** | 5-10s | 8-15s | 15-25s |
| **Translate to 3 lang** | 8-12s | 12-20s | 20-30s |
| **Analyze 50 responses** | 10-15s | 15-25s | 25-40s |
| **First request (cold)** | 10-20s | 15-30s | 30-45s |
| **Subsequent requests** | 3-8s | 5-12s | 10-20s |
*Times are approximate and vary based on server load*
---
## πŸŽ“ Advanced Tips
### 1. Model-Specific Prompting
**For Flan-T5-XXL (Default):**
```
Task: Create survey about mobile app satisfaction
Requirements:
- 10 questions
- Topics: usability, performance, features
- Audience: iOS users 25-45
Generate a professional survey following best practices.
```
**For Flan-T5-XL (Fast):**
```
Create 8 questions about mobile app satisfaction.
Topics: usability, performance, features.
Audience: iOS users 25-45.
```
**For Flan-UL2 (More Context):**
```
Generate a comprehensive survey to understand mobile app user satisfaction.
Context: We're a productivity app with 100K users. Recent reviews mention
performance issues and missing features. We need to understand:
1. Current satisfaction levels
2. Specific pain points
3. Feature priorities
Target: iOS users aged 25-45 who use the app daily.
Create 12-15 questions following qualitative research best practices.
```
### 2. Optimize for Speed
**Fast survey generation:**
1. Use Flan-T5-XL
2. Keep outline to 2-3 sentences
3. Request 5-8 questions
4. Use clear, direct prompts
**Result:** 3-8 second generation
### 3. Optimize for Quality
**High-quality surveys:**
1. Use Flan-UL2
2. Provide detailed context and examples
3. Request 10-15 questions
4. Include specific requirements
**Result:** Professional, well-structured surveys
---
## ❓ FAQ
**Q: Why is Flan-T5-XXL the default?**
A: It's guaranteed to be deployed on HF Inference API, fast, and reliable. Google's instruction-tuned model works well for structured tasks.
**Q: Can I use multiple models in one app?**
A: Yes! Change `LLM_MODEL` environment variable to switch models.
**Q: Which model is best for non-English?**
A: All Flan-T5 models support multiple languages. For best multilingual support, try Flan-UL2.
**Q: Do these models cost money?**
A: No! All are free on HuggingFace Inference API.
**Q: Can I use my own fine-tuned model?**
A: Yes! Set `LLM_MODEL` to your model ID on HuggingFace.
**Q: What if I need better performance?**
A: Consider:
1. HuggingFace Pro (faster free tier)
2. Deploy model yourself (Hugging Face Inference Endpoints)
3. Use dedicated GPU
---
## πŸš€ Quick Start Commands
```bash
# Try Flan-T5-XXL (default, balanced)
LLM_MODEL=google/flan-t5-xxl python app.py
# Try Flan-T5-XL (fastest)
LLM_MODEL=google/flan-t5-xl python app.py
# Try Flan-UL2 (more context)
LLM_MODEL=google/flan-ul2 python app.py
# Check which model is active
python check_env.py
```
---
**Updated:** November 2025
**All models tested and working on HuggingFace free tier**
For more help, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md) or [USER_GUIDE.md](USER_GUIDE.md)