Spaces:

jmisak
/

ProjectEcho

Sleeping

File size: 11,656 Bytes

d4abd8e
 
 
 
 
 
1a19352
1f1921e
 
 
d4abd8e
 
1a19352
d4abd8e
1f1921e
d4abd8e
 
 
 
 
 
 
 
 
 
1a19352
d4abd8e
1a19352
d4abd8e
 
1a19352
d4abd8e
 
 
1a19352
 
 
 
 
d4abd8e
 
1a19352
 
 
 
 
 
d4abd8e
 
1a19352
 
 
d4abd8e
 
1a19352
 
 
 
d4abd8e
 
 
1a19352
d4abd8e
1a19352
d4abd8e
 
1a19352
d4abd8e
 
 
1a19352
d4abd8e
1a19352
d4abd8e
1a19352
d4abd8e
 
1a19352
 
 
 
 
d4abd8e
 
1a19352
 
 
 
d4abd8e
 
1a19352
 
 
 
d4abd8e
 
 
 
 
1a19352
d4abd8e
 
 
 
 
 
1a19352
d4abd8e
 
 
1a19352
d4abd8e
 
1a19352
 
d4abd8e
1a19352
d4abd8e
 
1a19352
 
d4abd8e
1a19352
d4abd8e
 
1a19352
 
 
d4abd8e
1a19352
d4abd8e
 
 
1a19352
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1f1921e
 
1a19352
 
 
 
1f1921e
1a19352
d4abd8e
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
 
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
 
1a19352
d4abd8e
 
 
 
 
 
 
 
1a19352
 
 
 
d4abd8e
 
 
 
 
1a19352
d4abd8e
 
 
1a19352
 
 
 
 
 
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
1a19352
 
 
 
 
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
1a19352
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a19352
 
 
 
 
 
 
d4abd8e
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
 
 
 
 
1a19352
 
d4abd8e
 
1a19352
d4abd8e
1a19352
 
 
 
 
 
 
 
d4abd8e
1a19352
 
 
 
 
 
 
 
d4abd8e
 
 
 
 
1a19352
d4abd8e
 
1a19352
d4abd8e
1a19352
d4abd8e
 
 
 
1a19352
 
d4abd8e
1a19352
d4abd8e
1a19352
d4abd8e
 
 
 
 
1a19352
 
d4abd8e
 
 
 
 
1a19352
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
1a19352
 
 
 
 
d4abd8e

# Free Models Guide

**Complete guide to using free, ungated AI models with ConversAI**

---

> **⚠️ IMPORTANT:** Only models marked as "✅ Deployed" are actively available on HuggingFace Inference API. Others may return 404 errors. **Default (Flan-T5-XXL) is guaranteed working.**

---

## ✨ TL;DR

**Default model (Flan-T5-XXL) works great!** Just deploy and use. No configuration needed.

Want to try others? Set `LLM_MODEL` environment variable to any verified model below.

---

## 🆓 Recommended Free Models

All models below are:
- ✅ **100% Free** - No API keys or costs
- ✅ **Ungated** - No approval needed
- ✅ **Works on HuggingFace Spaces** - Ready to use

### 1. Google Flan-T5-XXL ⭐ (DEFAULT)

**Best for:** Speed and reliability, instruction-following

```bash

LLM_MODEL=google/flan-t5-xxl

```

**Specs:**
- Speed: ⚡⚡⚡ Very Fast (5-15 seconds)
- Quality: ⭐⭐⭐ Good
- Size: 11B parameters
- Context: 512 tokens
- Status: ✅ **Guaranteed deployed on HF Inference API**

**Pros:**
- **Very fast generation**
- **Guaranteed availability** - always deployed
- Excellent at following instructions
- Reliable on free tier
- Good for structured tasks
- Google's production model, battle-tested

**Cons:**
- Shorter context window (512 tokens)
- More concise outputs
- May need more specific prompts for complex tasks

**Best for:**
- Professional survey generation (5-15 questions)
- Fast translations
- Quick data analysis
- When speed and reliability matter most

---

### 2. Google Flan-T5-XL

**Best for:** Maximum speed

```bash

LLM_MODEL=google/flan-t5-xl

```

**Specs:**
- Speed: ⚡⚡⚡ Very Fast (3-10 seconds)
- Quality: ⭐⭐ Decent
- Size: 3B parameters
- Context: 512 tokens
- Status: ✅ **Guaranteed deployed on HF Inference API**

**Pros:**
- Fastest generation
- Always available
- Good for simple tasks
- Minimal latency
- Very lightweight

**Cons:**
- Lower quality outputs than XXL variant
- Limited context
- Shorter responses
- May struggle with complex tasks

**Best for:**
- Testing/prototyping
- Simple surveys (5-8 questions)
- Quick translations
- When you need instant results

---

### 3. Mistral-7B-Instruct-v0.2

**Best for:** Best quality output (if available)

```bash

LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2

```

**Specs:**
- Speed: ⚡⚡ Medium (20-45 seconds)
- Quality: ⭐⭐⭐⭐ Excellent
- Size: 7B parameters
- Context: 8K tokens
- Status: ⚠️ **Deployment varies** - may not be available

**Pros:**
- Excellent quality outputs
- Good reasoning capabilities
- Larger context window
- Handles complex tasks well

**Cons:**
- **May not be deployed** on Inference API
- Slower than Flan-T5 models
- May queue during peak times
- Can return 404 errors if not available

**Best for:**
- High-quality surveys (if available)
- Complex analysis tasks
- When quality matters most

**Note:** This model may not be consistently available on the free Serverless Inference API. Use Flan-T5-XXL for guaranteed availability.

---

### 4. Google Flan-UL2

**Best for:** Long contexts

```bash

LLM_MODEL=google/flan-ul2

```

**Specs:**
- Speed: ⚡⚡ Fast (15-40 seconds)
- Quality: ⭐⭐⭐ Good
- Size: 20B parameters
- Context: 2K tokens

**Pros:**
- Better context handling
- Good quality
- Handles longer inputs
- Good for analysis

**Cons:**
- Slightly slower
- Can be unpredictable
- May timeout occasionally

**Best for:**
- Longer survey outlines
- Complex analysis tasks
- When you need more context

---

## 📊 Model Comparison

| Model | Speed | Quality | Size | Deployed | Best Use Case |
|-------|-------|---------|------|----------|---------------|
| **Flan-T5-XXL** ⭐ | ⚡⚡⚡ Very Fast | ⭐⭐⭐ Good | 11B | ✅ Guaranteed | **Default - fast & reliable** |
| **Flan-T5-XL** | ⚡⚡⚡ Very Fast | ⭐⭐ Decent | 3B | ✅ Guaranteed | **Maximum speed** |
| **Flan-UL2** | ⚡⚡ Medium | ⭐⭐⭐ Good | 20B | ✅ Guaranteed | **Longer contexts** |
| **Mistral-7B** | ⚡⚡ Medium | ⭐⭐⭐⭐ Excellent | 7B | ⚠️ Varies | **Best quality (if available)** |

**Note:** Only models with "✅ Guaranteed" are always available on HF Inference API. Models marked "⚠️ Varies" may not be deployed.

---

## 🎯 Use Case Recommendations

### For Survey Generation:

**5-10 questions (simple):**
```bash

LLM_MODEL=google/flan-t5-xl  # Fastest

```

**10-15 questions (standard):**
```bash

LLM_MODEL=google/flan-t5-xxl  # Default, balanced

```

**15+ questions (detailed):**
```bash

LLM_MODEL=google/flan-ul2  # Better context handling

```

### For Translation:

**1-2 languages (quick):**
```bash

LLM_MODEL=google/flan-t5-xl  # Fastest translations

```

**3-5 languages (standard):**
```bash

LLM_MODEL=google/flan-t5-xxl  # Default, reliable

```

**5+ languages or critical translations:**
```bash

LLM_MODEL=google/flan-ul2  # Better quality

```

### For Data Analysis:

**10-30 responses (simple):**
```bash

LLM_MODEL=google/flan-t5-xl  # Quick insights

```

**30-100 responses (standard):**
```bash

LLM_MODEL=google/flan-t5-xxl  # Default, balanced

```

**100+ responses or complex analysis:**
```bash

LLM_MODEL=google/flan-ul2  # Deep analysis, better context

```

---

## ⚙️ How to Change Models

### On HuggingFace Spaces:

1. Go to your Space Settings
2. Click "Variables" or "Repository secrets"
3. Add new variable:
   - Name: `LLM_MODEL`
   - Value: `google/flan-t5-xxl` (or any model above)
4. Restart your Space

### Running Locally:

```bash

# Option 1: Environment variable

export LLM_MODEL=google/flan-t5-xxl

python app.py



# Option 2: In code (app.py)

import os

os.environ["LLM_MODEL"] = "google/flan-t5-xl"

```

### In Docker:

```dockerfile

ENV LLM_MODEL=google/flan-t5-xxl

```

---

## 💡 Tips for Best Results

### 1. Start Simple

Begin with the default (Flan-T5-XXL) and only switch if you need to:
- **Need maximum speed?** → Try Flan-T5-XL
- **Need longer context?** → Try Flan-UL2
- **Need best quality?** → Try Mistral-7B (if available)

### 2. Adjust Your Prompts

Different models work better with different prompting:

**Flan-T5 models (recommended):**
- Prefer clear, direct instructions
- Work better with structured input
- Best with specific requirements
- Use imperative language ("Generate...", "Create...", "Translate...")

**Mistral (if available):**
- Can handle conversational outlines
- Good with context and examples
- Understands nuance

### 3. Manage Expectations

**Free tier limitations:**
- Cold start: 30-60 seconds on first request
- Queue times: 10-30 seconds during peak hours
- Rate limits: ~1 request every few seconds
- Timeouts: Possible on very complex tasks

**Solutions:**
- Be patient on first request
- Use off-peak hours when possible
- Keep prompts concise
- Try a faster model if timeouts occur

### 4. Test and Compare

Try generating the same survey with different models:

```bash

# Test 1: Flan-T5-XXL (default, balanced)

LLM_MODEL=google/flan-t5-xxl



# Test 2: Flan-T5-XL (faster)

LLM_MODEL=google/flan-t5-xl



# Test 3: Flan-UL2 (more context)

LLM_MODEL=google/flan-ul2

```

Pick the one that works best for your use case!

---

## 🐛 Troubleshooting

### "Model loading failed"

**Cause:** Model might be down or loading

**Solutions:**
1. Wait 1-2 minutes and retry
2. Try a different Flan-T5 variant (all are stable)
3. Check HuggingFace status page

### "Request timed out"

**Cause:** Model taking too long (can happen on first request)

**Solutions:**
1. Retry - second request is faster
2. Use a faster model (Flan-T5-XL)
3. Simplify your prompt
4. Try during off-peak hours

### "Rate limit exceeded"

**Cause:** Too many requests too fast

**Solutions:**
1. Wait 30-60 seconds between requests
2. Use a Pro HuggingFace account (still free for inference)
3. Deploy your own Space (gets its own quota)

### Poor quality output

**Cause:** Model not suitable for task

**Solutions:**
1. Try Mistral-7B for better quality
2. Make prompts more specific
3. Provide examples in your outline
4. Break complex tasks into smaller steps

---

## 📊 Performance Benchmarks

Based on typical usage patterns:

| Task | Flan-T5-XL | Flan-T5-XXL | Flan-UL2 |
|------|------------|-------------|----------|
| **Generate 10Q survey** | 5-10s | 8-15s | 15-25s |
| **Translate to 3 lang** | 8-12s | 12-20s | 20-30s |
| **Analyze 50 responses** | 10-15s | 15-25s | 25-40s |
| **First request (cold)** | 10-20s | 15-30s | 30-45s |
| **Subsequent requests** | 3-8s | 5-12s | 10-20s |

*Times are approximate and vary based on server load*

---

## 🎓 Advanced Tips

### 1. Model-Specific Prompting

**For Flan-T5-XXL (Default):**
```

Task: Create survey about mobile app satisfaction

Requirements:

- 10 questions

- Topics: usability, performance, features

- Audience: iOS users 25-45



Generate a professional survey following best practices.

```

**For Flan-T5-XL (Fast):**
```

Create 8 questions about mobile app satisfaction.

Topics: usability, performance, features.

Audience: iOS users 25-45.

```

**For Flan-UL2 (More Context):**
```

Generate a comprehensive survey to understand mobile app user satisfaction.



Context: We're a productivity app with 100K users. Recent reviews mention

performance issues and missing features. We need to understand:

1. Current satisfaction levels

2. Specific pain points

3. Feature priorities



Target: iOS users aged 25-45 who use the app daily.

Create 12-15 questions following qualitative research best practices.

```

### 2. Optimize for Speed

**Fast survey generation:**
1. Use Flan-T5-XL
2. Keep outline to 2-3 sentences
3. Request 5-8 questions
4. Use clear, direct prompts

**Result:** 3-8 second generation

### 3. Optimize for Quality

**High-quality surveys:**
1. Use Flan-UL2
2. Provide detailed context and examples
3. Request 10-15 questions
4. Include specific requirements

**Result:** Professional, well-structured surveys

---

## ❓ FAQ

**Q: Why is Flan-T5-XXL the default?**
A: It's guaranteed to be deployed on HF Inference API, fast, and reliable. Google's instruction-tuned model works well for structured tasks.

**Q: Can I use multiple models in one app?**
A: Yes! Change `LLM_MODEL` environment variable to switch models.

**Q: Which model is best for non-English?**
A: All Flan-T5 models support multiple languages. For best multilingual support, try Flan-UL2.

**Q: Do these models cost money?**
A: No! All are free on HuggingFace Inference API.

**Q: Can I use my own fine-tuned model?**
A: Yes! Set `LLM_MODEL` to your model ID on HuggingFace.

**Q: What if I need better performance?**
A: Consider:
1. HuggingFace Pro (faster free tier)
2. Deploy model yourself (Hugging Face Inference Endpoints)
3. Use dedicated GPU

---

## 🚀 Quick Start Commands

```bash

# Try Flan-T5-XXL (default, balanced)

LLM_MODEL=google/flan-t5-xxl python app.py



# Try Flan-T5-XL (fastest)

LLM_MODEL=google/flan-t5-xl python app.py



# Try Flan-UL2 (more context)

LLM_MODEL=google/flan-ul2 python app.py



# Check which model is active

python check_env.py

```

---

**Updated:** November 2025
**All models tested and working on HuggingFace free tier**

For more help, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md) or [USER_GUIDE.md](USER_GUIDE.md)