Spaces:

jmisak
/

ProjectEcho

Sleeping

App Files Files Community

ProjectEcho / FREE_MODELS.md

jmisak

Upload 4 files

1a19352 verified 2 months ago

preview code

raw

history blame contribute delete

11.7 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Free Models Guide

Complete guide to using free, ungated AI models with ConversAI

⚠️ IMPORTANT: Only models marked as "✅ Deployed" are actively available on HuggingFace Inference API. Others may return 404 errors. Default (Flan-T5-XXL) is guaranteed working.

✨ TL;DR

Default model (Flan-T5-XXL) works great! Just deploy and use. No configuration needed.

Want to try others? Set LLM_MODEL environment variable to any verified model below.

🆓 Recommended Free Models

All models below are:

✅ 100% Free - No API keys or costs
✅ Ungated - No approval needed
✅ Works on HuggingFace Spaces - Ready to use

1. Google Flan-T5-XXL ⭐ (DEFAULT)

Best for: Speed and reliability, instruction-following

LLM_MODEL=google/flan-t5-xxl

Specs:

Speed: ⚡⚡⚡ Very Fast (5-15 seconds)
Quality: ⭐⭐⭐ Good
Size: 11B parameters
Context: 512 tokens
Status: ✅ Guaranteed deployed on HF Inference API

Pros:

Very fast generation
Guaranteed availability - always deployed
Excellent at following instructions
Reliable on free tier
Good for structured tasks
Google's production model, battle-tested

Cons:

Shorter context window (512 tokens)
More concise outputs
May need more specific prompts for complex tasks

Best for:

Professional survey generation (5-15 questions)
Fast translations
Quick data analysis
When speed and reliability matter most

2. Google Flan-T5-XL

Best for: Maximum speed

LLM_MODEL=google/flan-t5-xl

Specs:

Speed: ⚡⚡⚡ Very Fast (3-10 seconds)
Quality: ⭐⭐ Decent
Size: 3B parameters
Context: 512 tokens
Status: ✅ Guaranteed deployed on HF Inference API

Pros:

Fastest generation
Always available
Good for simple tasks
Minimal latency
Very lightweight

Cons:

Lower quality outputs than XXL variant
Limited context
Shorter responses
May struggle with complex tasks

Best for:

Testing/prototyping
Simple surveys (5-8 questions)
Quick translations
When you need instant results

3. Mistral-7B-Instruct-v0.2

Best for: Best quality output (if available)

LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2

Specs:

Speed: ⚡⚡ Medium (20-45 seconds)
Quality: ⭐⭐⭐⭐ Excellent
Size: 7B parameters
Context: 8K tokens
Status: ⚠️ Deployment varies - may not be available

Pros:

Excellent quality outputs
Good reasoning capabilities
Larger context window
Handles complex tasks well

Cons:

May not be deployed on Inference API
Slower than Flan-T5 models
May queue during peak times
Can return 404 errors if not available

Best for:

High-quality surveys (if available)
Complex analysis tasks
When quality matters most

Note: This model may not be consistently available on the free Serverless Inference API. Use Flan-T5-XXL for guaranteed availability.

4. Google Flan-UL2

Best for: Long contexts

LLM_MODEL=google/flan-ul2

Specs:

Speed: ⚡⚡ Fast (15-40 seconds)
Quality: ⭐⭐⭐ Good
Size: 20B parameters
Context: 2K tokens

Pros:

Better context handling
Good quality
Handles longer inputs
Good for analysis

Cons:

Slightly slower
Can be unpredictable
May timeout occasionally

Best for:

Longer survey outlines
Complex analysis tasks
When you need more context

📊 Model Comparison

Model	Speed	Quality	Size	Deployed	Best Use Case
Flan-T5-XXL ⭐	⚡⚡⚡ Very Fast	⭐⭐⭐ Good	11B	✅ Guaranteed	Default - fast & reliable
Flan-T5-XL	⚡⚡⚡ Very Fast	⭐⭐ Decent	3B	✅ Guaranteed	Maximum speed
Flan-UL2	⚡⚡ Medium	⭐⭐⭐ Good	20B	✅ Guaranteed	Longer contexts
Mistral-7B	⚡⚡ Medium	⭐⭐⭐⭐ Excellent	7B	⚠️ Varies	Best quality (if available)

Note: Only models with "✅ Guaranteed" are always available on HF Inference API. Models marked "⚠️ Varies" may not be deployed.

🎯 Use Case Recommendations

For Survey Generation:

5-10 questions (simple):

LLM_MODEL=google/flan-t5-xl  # Fastest

10-15 questions (standard):

LLM_MODEL=google/flan-t5-xxl  # Default, balanced

15+ questions (detailed):

LLM_MODEL=google/flan-ul2  # Better context handling

For Translation:

1-2 languages (quick):

LLM_MODEL=google/flan-t5-xl  # Fastest translations

3-5 languages (standard):

LLM_MODEL=google/flan-t5-xxl  # Default, reliable

5+ languages or critical translations:

LLM_MODEL=google/flan-ul2  # Better quality

For Data Analysis:

10-30 responses (simple):

LLM_MODEL=google/flan-t5-xl  # Quick insights

30-100 responses (standard):

LLM_MODEL=google/flan-t5-xxl  # Default, balanced

100+ responses or complex analysis:

LLM_MODEL=google/flan-ul2  # Deep analysis, better context

⚙️ How to Change Models

On HuggingFace Spaces:

Go to your Space Settings
Click "Variables" or "Repository secrets"
Add new variable:
- Name: LLM_MODEL
- Value: google/flan-t5-xxl (or any model above)
Restart your Space

Running Locally:

# Option 1: Environment variable
export LLM_MODEL=google/flan-t5-xxl
python app.py

# Option 2: In code (app.py)
import os
os.environ["LLM_MODEL"] = "google/flan-t5-xl"

In Docker:

ENV LLM_MODEL=google/flan-t5-xxl

💡 Tips for Best Results

1. Start Simple

Begin with the default (Flan-T5-XXL) and only switch if you need to:

Need maximum speed? → Try Flan-T5-XL
Need longer context? → Try Flan-UL2
Need best quality? → Try Mistral-7B (if available)

2. Adjust Your Prompts

Different models work better with different prompting:

Flan-T5 models (recommended):

Prefer clear, direct instructions
Work better with structured input
Best with specific requirements
Use imperative language ("Generate...", "Create...", "Translate...")

Mistral (if available):

Can handle conversational outlines
Good with context and examples
Understands nuance

3. Manage Expectations

Free tier limitations:

Cold start: 30-60 seconds on first request
Queue times: 10-30 seconds during peak hours
Rate limits: ~1 request every few seconds
Timeouts: Possible on very complex tasks

Solutions:

Be patient on first request
Use off-peak hours when possible
Keep prompts concise
Try a faster model if timeouts occur

4. Test and Compare

Try generating the same survey with different models:

# Test 1: Flan-T5-XXL (default, balanced)
LLM_MODEL=google/flan-t5-xxl

# Test 2: Flan-T5-XL (faster)
LLM_MODEL=google/flan-t5-xl

# Test 3: Flan-UL2 (more context)
LLM_MODEL=google/flan-ul2

Pick the one that works best for your use case!

🐛 Troubleshooting

"Model loading failed"

Cause: Model might be down or loading

Solutions:

Wait 1-2 minutes and retry
Try a different Flan-T5 variant (all are stable)
Check HuggingFace status page

"Request timed out"

Cause: Model taking too long (can happen on first request)

Solutions:

Retry - second request is faster
Use a faster model (Flan-T5-XL)
Simplify your prompt
Try during off-peak hours

"Rate limit exceeded"

Cause: Too many requests too fast

Solutions:

Wait 30-60 seconds between requests
Use a Pro HuggingFace account (still free for inference)
Deploy your own Space (gets its own quota)

Poor quality output

Cause: Model not suitable for task

Solutions:

Try Mistral-7B for better quality
Make prompts more specific
Provide examples in your outline
Break complex tasks into smaller steps

📊 Performance Benchmarks

Based on typical usage patterns:

Task	Flan-T5-XL	Flan-T5-XXL	Flan-UL2
Generate 10Q survey	5-10s	8-15s	15-25s
Translate to 3 lang	8-12s	12-20s	20-30s
Analyze 50 responses	10-15s	15-25s	25-40s
First request (cold)	10-20s	15-30s	30-45s
Subsequent requests	3-8s	5-12s	10-20s

Times are approximate and vary based on server load

🎓 Advanced Tips

1. Model-Specific Prompting

For Flan-T5-XXL (Default):

Task: Create survey about mobile app satisfaction
Requirements:
- 10 questions
- Topics: usability, performance, features
- Audience: iOS users 25-45

Generate a professional survey following best practices.

For Flan-T5-XL (Fast):

Create 8 questions about mobile app satisfaction.
Topics: usability, performance, features.
Audience: iOS users 25-45.

For Flan-UL2 (More Context):

Generate a comprehensive survey to understand mobile app user satisfaction.

Context: We're a productivity app with 100K users. Recent reviews mention
performance issues and missing features. We need to understand:
1. Current satisfaction levels
2. Specific pain points
3. Feature priorities

Target: iOS users aged 25-45 who use the app daily.
Create 12-15 questions following qualitative research best practices.

2. Optimize for Speed

Fast survey generation:

Use Flan-T5-XL
Keep outline to 2-3 sentences
Request 5-8 questions
Use clear, direct prompts

Result: 3-8 second generation

3. Optimize for Quality

High-quality surveys:

Use Flan-UL2
Provide detailed context and examples
Request 10-15 questions
Include specific requirements

Result: Professional, well-structured surveys

❓ FAQ

Q: Why is Flan-T5-XXL the default? A: It's guaranteed to be deployed on HF Inference API, fast, and reliable. Google's instruction-tuned model works well for structured tasks.

Q: Can I use multiple models in one app? A: Yes! Change LLM_MODEL environment variable to switch models.

Q: Which model is best for non-English? A: All Flan-T5 models support multiple languages. For best multilingual support, try Flan-UL2.

Q: Do these models cost money? A: No! All are free on HuggingFace Inference API.

Q: Can I use my own fine-tuned model? A: Yes! Set LLM_MODEL to your model ID on HuggingFace.

Q: What if I need better performance? A: Consider:

HuggingFace Pro (faster free tier)
Deploy model yourself (Hugging Face Inference Endpoints)
Use dedicated GPU

🚀 Quick Start Commands

# Try Flan-T5-XXL (default, balanced)
LLM_MODEL=google/flan-t5-xxl python app.py

# Try Flan-T5-XL (fastest)
LLM_MODEL=google/flan-t5-xl python app.py

# Try Flan-UL2 (more context)
LLM_MODEL=google/flan-ul2 python app.py

# Check which model is active
python check_env.py

Updated: November 2025 All models tested and working on HuggingFace free tier

For more help, see TROUBLESHOOTING.md or USER_GUIDE.md