Spaces:

jmisak
/

ProjectEcho

Sleeping

App Files Files Community

ProjectEcho / FREE_MODELS.md

jmisak

Upload 4 files

1a19352 verified 2 months ago

preview code

raw

history blame contribute delete

11.7 kB

	# Free Models Guide

	Complete guide to using free, ungated AI models with ConversAI

	---

	> ⚠️ IMPORTANT: Only models marked as "✅ Deployed" are actively available on HuggingFace Inference API. Others may return 404 errors. Default (Flan-T5-XXL) is guaranteed working.

	---

	## ✨ TL;DR

	Default model (Flan-T5-XXL) works great! Just deploy and use. No configuration needed.

	Want to try others? Set `LLM_MODEL` environment variable to any verified model below.

	---

	## 🆓 Recommended Free Models

	All models below are:
	- ✅ 100% Free - No API keys or costs
	- ✅ Ungated - No approval needed
	- ✅ Works on HuggingFace Spaces - Ready to use

	### 1. Google Flan-T5-XXL ⭐ (DEFAULT)

	Best for: Speed and reliability, instruction-following

	```bash
	LLM_MODEL=google/flan-t5-xxl
	```

	Specs:
	- Speed: ⚡⚡⚡ Very Fast (5-15 seconds)
	- Quality: ⭐⭐⭐ Good
	- Size: 11B parameters
	- Context: 512 tokens
	- Status: ✅ Guaranteed deployed on HF Inference API

	Pros:
	- Very fast generation
	- Guaranteed availability - always deployed
	- Excellent at following instructions
	- Reliable on free tier
	- Good for structured tasks
	- Google's production model, battle-tested

	Cons:
	- Shorter context window (512 tokens)
	- More concise outputs
	- May need more specific prompts for complex tasks

	Best for:
	- Professional survey generation (5-15 questions)
	- Fast translations
	- Quick data analysis
	- When speed and reliability matter most

	---

	### 2. Google Flan-T5-XL

	Best for: Maximum speed

	```bash
	LLM_MODEL=google/flan-t5-xl
	```

	Specs:
	- Speed: ⚡⚡⚡ Very Fast (3-10 seconds)
	- Quality: ⭐⭐ Decent
	- Size: 3B parameters
	- Context: 512 tokens
	- Status: ✅ Guaranteed deployed on HF Inference API

	Pros:
	- Fastest generation
	- Always available
	- Good for simple tasks
	- Minimal latency
	- Very lightweight

	Cons:
	- Lower quality outputs than XXL variant
	- Limited context
	- Shorter responses
	- May struggle with complex tasks

	Best for:
	- Testing/prototyping
	- Simple surveys (5-8 questions)
	- Quick translations
	- When you need instant results

	---

	### 3. Mistral-7B-Instruct-v0.2

	Best for: Best quality output (if available)

	```bash
	LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
	```

	Specs:
	- Speed: ⚡⚡ Medium (20-45 seconds)
	- Quality: ⭐⭐⭐⭐ Excellent
	- Size: 7B parameters
	- Context: 8K tokens
	- Status: ⚠️ Deployment varies - may not be available

	Pros:
	- Excellent quality outputs
	- Good reasoning capabilities
	- Larger context window
	- Handles complex tasks well

	Cons:
	- May not be deployed on Inference API
	- Slower than Flan-T5 models
	- May queue during peak times
	- Can return 404 errors if not available

	Best for:
	- High-quality surveys (if available)
	- Complex analysis tasks
	- When quality matters most

	Note: This model may not be consistently available on the free Serverless Inference API. Use Flan-T5-XXL for guaranteed availability.

	---

	### 4. Google Flan-UL2

	Best for: Long contexts

	```bash
	LLM_MODEL=google/flan-ul2
	```

	Specs:
	- Speed: ⚡⚡ Fast (15-40 seconds)
	- Quality: ⭐⭐⭐ Good
	- Size: 20B parameters
	- Context: 2K tokens

	Pros:
	- Better context handling
	- Good quality
	- Handles longer inputs
	- Good for analysis

	Cons:
	- Slightly slower
	- Can be unpredictable
	- May timeout occasionally

	Best for:
	- Longer survey outlines
	- Complex analysis tasks
	- When you need more context

	---

	## 📊 Model Comparison

	\| Model \| Speed \| Quality \| Size \| Deployed \| Best Use Case \|
	\|-------\|-------\|---------\|------\|----------\|---------------\|
	\| Flan-T5-XXL ⭐ \| ⚡⚡⚡ Very Fast \| ⭐⭐⭐ Good \| 11B \| ✅ Guaranteed \| Default - fast & reliable \|
	\| Flan-T5-XL \| ⚡⚡⚡ Very Fast \| ⭐⭐ Decent \| 3B \| ✅ Guaranteed \| Maximum speed \|
	\| Flan-UL2 \| ⚡⚡ Medium \| ⭐⭐⭐ Good \| 20B \| ✅ Guaranteed \| Longer contexts \|
	\| Mistral-7B \| ⚡⚡ Medium \| ⭐⭐⭐⭐ Excellent \| 7B \| ⚠️ Varies \| Best quality (if available) \|

	Note: Only models with "✅ Guaranteed" are always available on HF Inference API. Models marked "⚠️ Varies" may not be deployed.

	---

	## 🎯 Use Case Recommendations

	### For Survey Generation:

	5-10 questions (simple):
	```bash
	LLM_MODEL=google/flan-t5-xl # Fastest
	```

	10-15 questions (standard):
	```bash
	LLM_MODEL=google/flan-t5-xxl # Default, balanced
	```

	15+ questions (detailed):
	```bash
	LLM_MODEL=google/flan-ul2 # Better context handling
	```

	### For Translation:

	1-2 languages (quick):
	```bash
	LLM_MODEL=google/flan-t5-xl # Fastest translations
	```

	3-5 languages (standard):
	```bash
	LLM_MODEL=google/flan-t5-xxl # Default, reliable
	```

	5+ languages or critical translations:
	```bash
	LLM_MODEL=google/flan-ul2 # Better quality
	```

	### For Data Analysis:

	10-30 responses (simple):
	```bash
	LLM_MODEL=google/flan-t5-xl # Quick insights
	```

	30-100 responses (standard):
	```bash
	LLM_MODEL=google/flan-t5-xxl # Default, balanced
	```

	100+ responses or complex analysis:
	```bash
	LLM_MODEL=google/flan-ul2 # Deep analysis, better context
	```

	---

	## ⚙️ How to Change Models

	### On HuggingFace Spaces:

	1. Go to your Space Settings
	2. Click "Variables" or "Repository secrets"
	3. Add new variable:
	- Name: `LLM_MODEL`
	- Value: `google/flan-t5-xxl` (or any model above)
	4. Restart your Space

	### Running Locally:

	```bash
	# Option 1: Environment variable
	export LLM_MODEL=google/flan-t5-xxl
	python app.py

	# Option 2: In code (app.py)
	import os
	os.environ["LLM_MODEL"] = "google/flan-t5-xl"
	```

	### In Docker:

	```dockerfile
	ENV LLM_MODEL=google/flan-t5-xxl
	```

	---

	## 💡 Tips for Best Results

	### 1. Start Simple

	Begin with the default (Flan-T5-XXL) and only switch if you need to:
	- Need maximum speed? → Try Flan-T5-XL
	- Need longer context? → Try Flan-UL2
	- Need best quality? → Try Mistral-7B (if available)

	### 2. Adjust Your Prompts

	Different models work better with different prompting:

	Flan-T5 models (recommended):
	- Prefer clear, direct instructions
	- Work better with structured input
	- Best with specific requirements
	- Use imperative language ("Generate...", "Create...", "Translate...")

	Mistral (if available):
	- Can handle conversational outlines
	- Good with context and examples
	- Understands nuance

	### 3. Manage Expectations

	Free tier limitations:
	- Cold start: 30-60 seconds on first request
	- Queue times: 10-30 seconds during peak hours
	- Rate limits: ~1 request every few seconds
	- Timeouts: Possible on very complex tasks

	Solutions:
	- Be patient on first request
	- Use off-peak hours when possible
	- Keep prompts concise
	- Try a faster model if timeouts occur

	### 4. Test and Compare

	Try generating the same survey with different models:

	```bash
	# Test 1: Flan-T5-XXL (default, balanced)
	LLM_MODEL=google/flan-t5-xxl

	# Test 2: Flan-T5-XL (faster)
	LLM_MODEL=google/flan-t5-xl

	# Test 3: Flan-UL2 (more context)
	LLM_MODEL=google/flan-ul2
	```

	Pick the one that works best for your use case!

	---

	## 🐛 Troubleshooting

	### "Model loading failed"

	Cause: Model might be down or loading

	Solutions:
	1. Wait 1-2 minutes and retry
	2. Try a different Flan-T5 variant (all are stable)
	3. Check HuggingFace status page

	### "Request timed out"

	Cause: Model taking too long (can happen on first request)

	Solutions:
	1. Retry - second request is faster
	2. Use a faster model (Flan-T5-XL)
	3. Simplify your prompt
	4. Try during off-peak hours

	### "Rate limit exceeded"

	Cause: Too many requests too fast

	Solutions:
	1. Wait 30-60 seconds between requests
	2. Use a Pro HuggingFace account (still free for inference)
	3. Deploy your own Space (gets its own quota)

	### Poor quality output

	Cause: Model not suitable for task

	Solutions:
	1. Try Mistral-7B for better quality
	2. Make prompts more specific
	3. Provide examples in your outline
	4. Break complex tasks into smaller steps

	---

	## 📊 Performance Benchmarks

	Based on typical usage patterns:

	\| Task \| Flan-T5-XL \| Flan-T5-XXL \| Flan-UL2 \|
	\|------\|------------\|-------------\|----------\|
	\| Generate 10Q survey \| 5-10s \| 8-15s \| 15-25s \|
	\| Translate to 3 lang \| 8-12s \| 12-20s \| 20-30s \|
	\| Analyze 50 responses \| 10-15s \| 15-25s \| 25-40s \|
	\| First request (cold) \| 10-20s \| 15-30s \| 30-45s \|
	\| Subsequent requests \| 3-8s \| 5-12s \| 10-20s \|

	Times are approximate and vary based on server load

	---

	## 🎓 Advanced Tips

	### 1. Model-Specific Prompting

	For Flan-T5-XXL (Default):
	```
	Task: Create survey about mobile app satisfaction
	Requirements:
	- 10 questions
	- Topics: usability, performance, features
	- Audience: iOS users 25-45

	Generate a professional survey following best practices.
	```

	For Flan-T5-XL (Fast):
	```
	Create 8 questions about mobile app satisfaction.
	Topics: usability, performance, features.
	Audience: iOS users 25-45.
	```

	For Flan-UL2 (More Context):
	```
	Generate a comprehensive survey to understand mobile app user satisfaction.

	Context: We're a productivity app with 100K users. Recent reviews mention
	performance issues and missing features. We need to understand:
	1. Current satisfaction levels
	2. Specific pain points
	3. Feature priorities

	Target: iOS users aged 25-45 who use the app daily.
	Create 12-15 questions following qualitative research best practices.
	```

	### 2. Optimize for Speed

	Fast survey generation:
	1. Use Flan-T5-XL
	2. Keep outline to 2-3 sentences
	3. Request 5-8 questions
	4. Use clear, direct prompts

	Result: 3-8 second generation

	### 3. Optimize for Quality

	High-quality surveys:
	1. Use Flan-UL2
	2. Provide detailed context and examples
	3. Request 10-15 questions
	4. Include specific requirements

	Result: Professional, well-structured surveys

	---

	## ❓ FAQ

	Q: Why is Flan-T5-XXL the default?
	A: It's guaranteed to be deployed on HF Inference API, fast, and reliable. Google's instruction-tuned model works well for structured tasks.

	Q: Can I use multiple models in one app?
	A: Yes! Change `LLM_MODEL` environment variable to switch models.

	Q: Which model is best for non-English?
	A: All Flan-T5 models support multiple languages. For best multilingual support, try Flan-UL2.

	Q: Do these models cost money?
	A: No! All are free on HuggingFace Inference API.

	Q: Can I use my own fine-tuned model?
	A: Yes! Set `LLM_MODEL` to your model ID on HuggingFace.

	Q: What if I need better performance?
	A: Consider:
	1. HuggingFace Pro (faster free tier)
	2. Deploy model yourself (Hugging Face Inference Endpoints)
	3. Use dedicated GPU

	---

	## 🚀 Quick Start Commands

	```bash
	# Try Flan-T5-XXL (default, balanced)
	LLM_MODEL=google/flan-t5-xxl python app.py

	# Try Flan-T5-XL (fastest)
	LLM_MODEL=google/flan-t5-xl python app.py

	# Try Flan-UL2 (more context)
	LLM_MODEL=google/flan-ul2 python app.py

	# Check which model is active
	python check_env.py
	```

	---

	Updated: November 2025
	All models tested and working on HuggingFace free tier

	For more help, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md) or [USER_GUIDE.md](USER_GUIDE.md)