# Free Models Guide

**Complete guide to using free, ungated AI models with ConversAI**

---

> **⚠️ IMPORTANT:** Only models marked as "✅ Deployed" are actively available on HuggingFace Inference API. Others may return 404 errors. **Default (Flan-T5-XXL) is guaranteed working.**

---

## ✨ TL;DR

**Default model (Flan-T5-XXL) works great!** Just deploy and use. No configuration needed.

Want to try others? Set `LLM_MODEL` environment variable to any verified model below.

---

## 🆓 Recommended Free Models

All models below are:
- ✅ **100% Free** - No API keys or costs
- ✅ **Ungated** - No approval needed
- ✅ **Works on HuggingFace Spaces** - Ready to use

### 1. Google Flan-T5-XXL ⭐ (DEFAULT)

**Best for:** Speed and reliability, instruction-following

```bash
LLM_MODEL=google/flan-t5-xxl
```

**Specs:**
- Speed: ⚡⚡⚡ Very Fast (5-15 seconds)
- Quality: ⭐⭐⭐ Good
- Size: 11B parameters
- Context: 512 tokens
- Status: ✅ **Guaranteed deployed on HF Inference API**

**Pros:**
- **Very fast generation**
- **Guaranteed availability** - always deployed
- Excellent at following instructions
- Reliable on free tier
- Good for structured tasks
- Google's production model, battle-tested

**Cons:**
- Shorter context window (512 tokens)
- More concise outputs
- May need more specific prompts for complex tasks

**Best for:**
- Professional survey generation (5-15 questions)
- Fast translations
- Quick data analysis
- When speed and reliability matter most

---

### 2. Google Flan-T5-XL

**Best for:** Maximum speed

```bash
LLM_MODEL=google/flan-t5-xl
```

**Specs:**
- Speed: ⚡⚡⚡ Very Fast (3-10 seconds)
- Quality: ⭐⭐ Decent
- Size: 3B parameters
- Context: 512 tokens
- Status: ✅ **Guaranteed deployed on HF Inference API**

**Pros:**
- Fastest generation
- Always available
- Good for simple tasks
- Minimal latency
- Very lightweight

**Cons:**
- Lower quality outputs than XXL variant
- Limited context
- Shorter responses
- May struggle with complex tasks

**Best for:**
- Testing/prototyping
- Simple surveys (5-8 questions)
- Quick translations
- When you need instant results

---

### 3. Mistral-7B-Instruct-v0.2

**Best for:** Best quality output (if available)

```bash
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
```

**Specs:**
- Speed: ⚡⚡ Medium (20-45 seconds)
- Quality: ⭐⭐⭐⭐ Excellent
- Size: 7B parameters
- Context: 8K tokens
- Status: ⚠️ **Deployment varies** - may not be available

**Pros:**
- Excellent quality outputs
- Good reasoning capabilities
- Larger context window
- Handles complex tasks well

**Cons:**
- **May not be deployed** on Inference API
- Slower than Flan-T5 models
- May queue during peak times
- Can return 404 errors if not available

**Best for:**
- High-quality surveys (if available)
- Complex analysis tasks
- When quality matters most

**Note:** This model may not be consistently available on the free Serverless Inference API. Use Flan-T5-XXL for guaranteed availability.

---

### 4. Google Flan-UL2

**Best for:** Long contexts

```bash
LLM_MODEL=google/flan-ul2
```

**Specs:**
- Speed: ⚡⚡ Fast (15-40 seconds)
- Quality: ⭐⭐⭐ Good
- Size: 20B parameters
- Context: 2K tokens

**Pros:**
- Better context handling
- Good quality
- Handles longer inputs
- Good for analysis

**Cons:**
- Slightly slower
- Can be unpredictable
- May timeout occasionally

**Best for:**
- Longer survey outlines
- Complex analysis tasks
- When you need more context

---

## 📊 Model Comparison

| Model | Speed | Quality | Size | Deployed | Best Use Case |
|-------|-------|---------|------|----------|---------------|
| **Flan-T5-XXL** ⭐ | ⚡⚡⚡ Very Fast | ⭐⭐⭐ Good | 11B | ✅ Guaranteed | **Default - fast & reliable** |
| **Flan-T5-XL** | ⚡⚡⚡ Very Fast | ⭐⭐ Decent | 3B | ✅ Guaranteed | **Maximum speed** |
| **Flan-UL2** | ⚡⚡ Medium | ⭐⭐⭐ Good | 20B | ✅ Guaranteed | **Longer contexts** |
| **Mistral-7B** | ⚡⚡ Medium | ⭐⭐⭐⭐ Excellent | 7B | ⚠️ Varies | **Best quality (if available)** |

**Note:** Only models with "✅ Guaranteed" are always available on HF Inference API. Models marked "⚠️ Varies" may not be deployed.

---

## 🎯 Use Case Recommendations

### For Survey Generation:

**5-10 questions (simple):**
```bash
LLM_MODEL=google/flan-t5-xl  # Fastest
```

**10-15 questions (standard):**
```bash
LLM_MODEL=google/flan-t5-xxl  # Default, balanced
```

**15+ questions (detailed):**
```bash
LLM_MODEL=google/flan-ul2  # Better context handling
```

### For Translation:

**1-2 languages (quick):**
```bash
LLM_MODEL=google/flan-t5-xl  # Fastest translations
```

**3-5 languages (standard):**
```bash
LLM_MODEL=google/flan-t5-xxl  # Default, reliable
```

**5+ languages or critical translations:**
```bash
LLM_MODEL=google/flan-ul2  # Better quality
```

### For Data Analysis:

**10-30 responses (simple):**
```bash
LLM_MODEL=google/flan-t5-xl  # Quick insights
```

**30-100 responses (standard):**
```bash
LLM_MODEL=google/flan-t5-xxl  # Default, balanced
```

**100+ responses or complex analysis:**
```bash
LLM_MODEL=google/flan-ul2  # Deep analysis, better context
```

---

## ⚙️ How to Change Models

### On HuggingFace Spaces:

1. Go to your Space Settings
2. Click "Variables" or "Repository secrets"
3. Add new variable:
   - Name: `LLM_MODEL`
   - Value: `google/flan-t5-xxl` (or any model above)
4. Restart your Space

### Running Locally:

```bash
# Option 1: Environment variable
export LLM_MODEL=google/flan-t5-xxl
python app.py

# Option 2: In code (app.py)
import os
os.environ["LLM_MODEL"] = "google/flan-t5-xl"
```

### In Docker:

```dockerfile
ENV LLM_MODEL=google/flan-t5-xxl
```

---

## 💡 Tips for Best Results

### 1. Start Simple

Begin with the default (Flan-T5-XXL) and only switch if you need to:
- **Need maximum speed?** → Try Flan-T5-XL
- **Need longer context?** → Try Flan-UL2
- **Need best quality?** → Try Mistral-7B (if available)

### 2. Adjust Your Prompts

Different models work better with different prompting:

**Flan-T5 models (recommended):**
- Prefer clear, direct instructions
- Work better with structured input
- Best with specific requirements
- Use imperative language ("Generate...", "Create...", "Translate...")

**Mistral (if available):**
- Can handle conversational outlines
- Good with context and examples
- Understands nuance

### 3. Manage Expectations

**Free tier limitations:**
- Cold start: 30-60 seconds on first request
- Queue times: 10-30 seconds during peak hours
- Rate limits: ~1 request every few seconds
- Timeouts: Possible on very complex tasks

**Solutions:**
- Be patient on first request
- Use off-peak hours when possible
- Keep prompts concise
- Try a faster model if timeouts occur

### 4. Test and Compare

Try generating the same survey with different models:

```bash
# Test 1: Flan-T5-XXL (default, balanced)
LLM_MODEL=google/flan-t5-xxl

# Test 2: Flan-T5-XL (faster)
LLM_MODEL=google/flan-t5-xl

# Test 3: Flan-UL2 (more context)
LLM_MODEL=google/flan-ul2
```

Pick the one that works best for your use case!

---

## 🐛 Troubleshooting

### "Model loading failed"

**Cause:** Model might be down or loading

**Solutions:**
1. Wait 1-2 minutes and retry
2. Try a different Flan-T5 variant (all are stable)
3. Check HuggingFace status page

### "Request timed out"

**Cause:** Model taking too long (can happen on first request)

**Solutions:**
1. Retry - second request is faster
2. Use a faster model (Flan-T5-XL)
3. Simplify your prompt
4. Try during off-peak hours

### "Rate limit exceeded"

**Cause:** Too many requests too fast

**Solutions:**
1. Wait 30-60 seconds between requests
2. Use a Pro HuggingFace account (still free for inference)
3. Deploy your own Space (gets its own quota)

### Poor quality output

**Cause:** Model not suitable for task

**Solutions:**
1. Try Mistral-7B for better quality
2. Make prompts more specific
3. Provide examples in your outline
4. Break complex tasks into smaller steps

---

## 📊 Performance Benchmarks

Based on typical usage patterns:

| Task | Flan-T5-XL | Flan-T5-XXL | Flan-UL2 |
|------|------------|-------------|----------|
| **Generate 10Q survey** | 5-10s | 8-15s | 15-25s |
| **Translate to 3 lang** | 8-12s | 12-20s | 20-30s |
| **Analyze 50 responses** | 10-15s | 15-25s | 25-40s |
| **First request (cold)** | 10-20s | 15-30s | 30-45s |
| **Subsequent requests** | 3-8s | 5-12s | 10-20s |

*Times are approximate and vary based on server load*

---

## 🎓 Advanced Tips

### 1. Model-Specific Prompting

**For Flan-T5-XXL (Default):**
```
Task: Create survey about mobile app satisfaction
Requirements:
- 10 questions
- Topics: usability, performance, features
- Audience: iOS users 25-45

Generate a professional survey following best practices.
```

**For Flan-T5-XL (Fast):**
```
Create 8 questions about mobile app satisfaction.
Topics: usability, performance, features.
Audience: iOS users 25-45.
```

**For Flan-UL2 (More Context):**
```
Generate a comprehensive survey to understand mobile app user satisfaction.

Context: We're a productivity app with 100K users. Recent reviews mention
performance issues and missing features. We need to understand:
1. Current satisfaction levels
2. Specific pain points
3. Feature priorities

Target: iOS users aged 25-45 who use the app daily.
Create 12-15 questions following qualitative research best practices.
```

### 2. Optimize for Speed

**Fast survey generation:**
1. Use Flan-T5-XL
2. Keep outline to 2-3 sentences
3. Request 5-8 questions
4. Use clear, direct prompts

**Result:** 3-8 second generation

### 3. Optimize for Quality

**High-quality surveys:**
1. Use Flan-UL2
2. Provide detailed context and examples
3. Request 10-15 questions
4. Include specific requirements

**Result:** Professional, well-structured surveys

---

## ❓ FAQ

**Q: Why is Flan-T5-XXL the default?**
A: It's guaranteed to be deployed on HF Inference API, fast, and reliable. Google's instruction-tuned model works well for structured tasks.

**Q: Can I use multiple models in one app?**
A: Yes! Change `LLM_MODEL` environment variable to switch models.

**Q: Which model is best for non-English?**
A: All Flan-T5 models support multiple languages. For best multilingual support, try Flan-UL2.

**Q: Do these models cost money?**
A: No! All are free on HuggingFace Inference API.

**Q: Can I use my own fine-tuned model?**
A: Yes! Set `LLM_MODEL` to your model ID on HuggingFace.

**Q: What if I need better performance?**
A: Consider:
1. HuggingFace Pro (faster free tier)
2. Deploy model yourself (Hugging Face Inference Endpoints)
3. Use dedicated GPU

---

## 🚀 Quick Start Commands

```bash
# Try Flan-T5-XXL (default, balanced)
LLM_MODEL=google/flan-t5-xxl python app.py

# Try Flan-T5-XL (fastest)
LLM_MODEL=google/flan-t5-xl python app.py

# Try Flan-UL2 (more context)
LLM_MODEL=google/flan-ul2 python app.py

# Check which model is active
python check_env.py
```

---

**Updated:** November 2025
**All models tested and working on HuggingFace free tier**

For more help, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md) or [USER_GUIDE.md](USER_GUIDE.md)