ProjectEcho / FREE_MODELS.md
jmisak's picture
Upload 4 files
1a19352 verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Free Models Guide

Complete guide to using free, ungated AI models with ConversAI


⚠️ IMPORTANT: Only models marked as "βœ… Deployed" are actively available on HuggingFace Inference API. Others may return 404 errors. Default (Flan-T5-XXL) is guaranteed working.


✨ TL;DR

Default model (Flan-T5-XXL) works great! Just deploy and use. No configuration needed.

Want to try others? Set LLM_MODEL environment variable to any verified model below.


πŸ†“ Recommended Free Models

All models below are:

  • βœ… 100% Free - No API keys or costs
  • βœ… Ungated - No approval needed
  • βœ… Works on HuggingFace Spaces - Ready to use

1. Google Flan-T5-XXL ⭐ (DEFAULT)

Best for: Speed and reliability, instruction-following

LLM_MODEL=google/flan-t5-xxl

Specs:

  • Speed: ⚑⚑⚑ Very Fast (5-15 seconds)
  • Quality: ⭐⭐⭐ Good
  • Size: 11B parameters
  • Context: 512 tokens
  • Status: βœ… Guaranteed deployed on HF Inference API

Pros:

  • Very fast generation
  • Guaranteed availability - always deployed
  • Excellent at following instructions
  • Reliable on free tier
  • Good for structured tasks
  • Google's production model, battle-tested

Cons:

  • Shorter context window (512 tokens)
  • More concise outputs
  • May need more specific prompts for complex tasks

Best for:

  • Professional survey generation (5-15 questions)
  • Fast translations
  • Quick data analysis
  • When speed and reliability matter most

2. Google Flan-T5-XL

Best for: Maximum speed

LLM_MODEL=google/flan-t5-xl

Specs:

  • Speed: ⚑⚑⚑ Very Fast (3-10 seconds)
  • Quality: ⭐⭐ Decent
  • Size: 3B parameters
  • Context: 512 tokens
  • Status: βœ… Guaranteed deployed on HF Inference API

Pros:

  • Fastest generation
  • Always available
  • Good for simple tasks
  • Minimal latency
  • Very lightweight

Cons:

  • Lower quality outputs than XXL variant
  • Limited context
  • Shorter responses
  • May struggle with complex tasks

Best for:

  • Testing/prototyping
  • Simple surveys (5-8 questions)
  • Quick translations
  • When you need instant results

3. Mistral-7B-Instruct-v0.2

Best for: Best quality output (if available)

LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2

Specs:

  • Speed: ⚑⚑ Medium (20-45 seconds)
  • Quality: ⭐⭐⭐⭐ Excellent
  • Size: 7B parameters
  • Context: 8K tokens
  • Status: ⚠️ Deployment varies - may not be available

Pros:

  • Excellent quality outputs
  • Good reasoning capabilities
  • Larger context window
  • Handles complex tasks well

Cons:

  • May not be deployed on Inference API
  • Slower than Flan-T5 models
  • May queue during peak times
  • Can return 404 errors if not available

Best for:

  • High-quality surveys (if available)
  • Complex analysis tasks
  • When quality matters most

Note: This model may not be consistently available on the free Serverless Inference API. Use Flan-T5-XXL for guaranteed availability.


4. Google Flan-UL2

Best for: Long contexts

LLM_MODEL=google/flan-ul2

Specs:

  • Speed: ⚑⚑ Fast (15-40 seconds)
  • Quality: ⭐⭐⭐ Good
  • Size: 20B parameters
  • Context: 2K tokens

Pros:

  • Better context handling
  • Good quality
  • Handles longer inputs
  • Good for analysis

Cons:

  • Slightly slower
  • Can be unpredictable
  • May timeout occasionally

Best for:

  • Longer survey outlines
  • Complex analysis tasks
  • When you need more context

πŸ“Š Model Comparison

Model Speed Quality Size Deployed Best Use Case
Flan-T5-XXL ⭐ ⚑⚑⚑ Very Fast ⭐⭐⭐ Good 11B βœ… Guaranteed Default - fast & reliable
Flan-T5-XL ⚑⚑⚑ Very Fast ⭐⭐ Decent 3B βœ… Guaranteed Maximum speed
Flan-UL2 ⚑⚑ Medium ⭐⭐⭐ Good 20B βœ… Guaranteed Longer contexts
Mistral-7B ⚑⚑ Medium ⭐⭐⭐⭐ Excellent 7B ⚠️ Varies Best quality (if available)

Note: Only models with "βœ… Guaranteed" are always available on HF Inference API. Models marked "⚠️ Varies" may not be deployed.


🎯 Use Case Recommendations

For Survey Generation:

5-10 questions (simple):

LLM_MODEL=google/flan-t5-xl  # Fastest

10-15 questions (standard):

LLM_MODEL=google/flan-t5-xxl  # Default, balanced

15+ questions (detailed):

LLM_MODEL=google/flan-ul2  # Better context handling

For Translation:

1-2 languages (quick):

LLM_MODEL=google/flan-t5-xl  # Fastest translations

3-5 languages (standard):

LLM_MODEL=google/flan-t5-xxl  # Default, reliable

5+ languages or critical translations:

LLM_MODEL=google/flan-ul2  # Better quality

For Data Analysis:

10-30 responses (simple):

LLM_MODEL=google/flan-t5-xl  # Quick insights

30-100 responses (standard):

LLM_MODEL=google/flan-t5-xxl  # Default, balanced

100+ responses or complex analysis:

LLM_MODEL=google/flan-ul2  # Deep analysis, better context

βš™οΈ How to Change Models

On HuggingFace Spaces:

  1. Go to your Space Settings
  2. Click "Variables" or "Repository secrets"
  3. Add new variable:
    • Name: LLM_MODEL
    • Value: google/flan-t5-xxl (or any model above)
  4. Restart your Space

Running Locally:

# Option 1: Environment variable
export LLM_MODEL=google/flan-t5-xxl
python app.py

# Option 2: In code (app.py)
import os
os.environ["LLM_MODEL"] = "google/flan-t5-xl"

In Docker:

ENV LLM_MODEL=google/flan-t5-xxl

πŸ’‘ Tips for Best Results

1. Start Simple

Begin with the default (Flan-T5-XXL) and only switch if you need to:

  • Need maximum speed? β†’ Try Flan-T5-XL
  • Need longer context? β†’ Try Flan-UL2
  • Need best quality? β†’ Try Mistral-7B (if available)

2. Adjust Your Prompts

Different models work better with different prompting:

Flan-T5 models (recommended):

  • Prefer clear, direct instructions
  • Work better with structured input
  • Best with specific requirements
  • Use imperative language ("Generate...", "Create...", "Translate...")

Mistral (if available):

  • Can handle conversational outlines
  • Good with context and examples
  • Understands nuance

3. Manage Expectations

Free tier limitations:

  • Cold start: 30-60 seconds on first request
  • Queue times: 10-30 seconds during peak hours
  • Rate limits: ~1 request every few seconds
  • Timeouts: Possible on very complex tasks

Solutions:

  • Be patient on first request
  • Use off-peak hours when possible
  • Keep prompts concise
  • Try a faster model if timeouts occur

4. Test and Compare

Try generating the same survey with different models:

# Test 1: Flan-T5-XXL (default, balanced)
LLM_MODEL=google/flan-t5-xxl

# Test 2: Flan-T5-XL (faster)
LLM_MODEL=google/flan-t5-xl

# Test 3: Flan-UL2 (more context)
LLM_MODEL=google/flan-ul2

Pick the one that works best for your use case!


πŸ› Troubleshooting

"Model loading failed"

Cause: Model might be down or loading

Solutions:

  1. Wait 1-2 minutes and retry
  2. Try a different Flan-T5 variant (all are stable)
  3. Check HuggingFace status page

"Request timed out"

Cause: Model taking too long (can happen on first request)

Solutions:

  1. Retry - second request is faster
  2. Use a faster model (Flan-T5-XL)
  3. Simplify your prompt
  4. Try during off-peak hours

"Rate limit exceeded"

Cause: Too many requests too fast

Solutions:

  1. Wait 30-60 seconds between requests
  2. Use a Pro HuggingFace account (still free for inference)
  3. Deploy your own Space (gets its own quota)

Poor quality output

Cause: Model not suitable for task

Solutions:

  1. Try Mistral-7B for better quality
  2. Make prompts more specific
  3. Provide examples in your outline
  4. Break complex tasks into smaller steps

πŸ“Š Performance Benchmarks

Based on typical usage patterns:

Task Flan-T5-XL Flan-T5-XXL Flan-UL2
Generate 10Q survey 5-10s 8-15s 15-25s
Translate to 3 lang 8-12s 12-20s 20-30s
Analyze 50 responses 10-15s 15-25s 25-40s
First request (cold) 10-20s 15-30s 30-45s
Subsequent requests 3-8s 5-12s 10-20s

Times are approximate and vary based on server load


πŸŽ“ Advanced Tips

1. Model-Specific Prompting

For Flan-T5-XXL (Default):

Task: Create survey about mobile app satisfaction
Requirements:
- 10 questions
- Topics: usability, performance, features
- Audience: iOS users 25-45

Generate a professional survey following best practices.

For Flan-T5-XL (Fast):

Create 8 questions about mobile app satisfaction.
Topics: usability, performance, features.
Audience: iOS users 25-45.

For Flan-UL2 (More Context):

Generate a comprehensive survey to understand mobile app user satisfaction.

Context: We're a productivity app with 100K users. Recent reviews mention
performance issues and missing features. We need to understand:
1. Current satisfaction levels
2. Specific pain points
3. Feature priorities

Target: iOS users aged 25-45 who use the app daily.
Create 12-15 questions following qualitative research best practices.

2. Optimize for Speed

Fast survey generation:

  1. Use Flan-T5-XL
  2. Keep outline to 2-3 sentences
  3. Request 5-8 questions
  4. Use clear, direct prompts

Result: 3-8 second generation

3. Optimize for Quality

High-quality surveys:

  1. Use Flan-UL2
  2. Provide detailed context and examples
  3. Request 10-15 questions
  4. Include specific requirements

Result: Professional, well-structured surveys


❓ FAQ

Q: Why is Flan-T5-XXL the default? A: It's guaranteed to be deployed on HF Inference API, fast, and reliable. Google's instruction-tuned model works well for structured tasks.

Q: Can I use multiple models in one app? A: Yes! Change LLM_MODEL environment variable to switch models.

Q: Which model is best for non-English? A: All Flan-T5 models support multiple languages. For best multilingual support, try Flan-UL2.

Q: Do these models cost money? A: No! All are free on HuggingFace Inference API.

Q: Can I use my own fine-tuned model? A: Yes! Set LLM_MODEL to your model ID on HuggingFace.

Q: What if I need better performance? A: Consider:

  1. HuggingFace Pro (faster free tier)
  2. Deploy model yourself (Hugging Face Inference Endpoints)
  3. Use dedicated GPU

πŸš€ Quick Start Commands

# Try Flan-T5-XXL (default, balanced)
LLM_MODEL=google/flan-t5-xxl python app.py

# Try Flan-T5-XL (fastest)
LLM_MODEL=google/flan-t5-xl python app.py

# Try Flan-UL2 (more context)
LLM_MODEL=google/flan-ul2 python app.py

# Check which model is active
python check_env.py

Updated: November 2025 All models tested and working on HuggingFace free tier

For more help, see TROUBLESHOOTING.md or USER_GUIDE.md