Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
Free Models Guide
Complete guide to using free, ungated AI models with ConversAI
β οΈ IMPORTANT: Only models marked as "β Deployed" are actively available on HuggingFace Inference API. Others may return 404 errors. Default (Flan-T5-XXL) is guaranteed working.
β¨ TL;DR
Default model (Flan-T5-XXL) works great! Just deploy and use. No configuration needed.
Want to try others? Set LLM_MODEL environment variable to any verified model below.
π Recommended Free Models
All models below are:
- β 100% Free - No API keys or costs
- β Ungated - No approval needed
- β Works on HuggingFace Spaces - Ready to use
1. Google Flan-T5-XXL β (DEFAULT)
Best for: Speed and reliability, instruction-following
LLM_MODEL=google/flan-t5-xxl
Specs:
- Speed: β‘β‘β‘ Very Fast (5-15 seconds)
- Quality: βββ Good
- Size: 11B parameters
- Context: 512 tokens
- Status: β Guaranteed deployed on HF Inference API
Pros:
- Very fast generation
- Guaranteed availability - always deployed
- Excellent at following instructions
- Reliable on free tier
- Good for structured tasks
- Google's production model, battle-tested
Cons:
- Shorter context window (512 tokens)
- More concise outputs
- May need more specific prompts for complex tasks
Best for:
- Professional survey generation (5-15 questions)
- Fast translations
- Quick data analysis
- When speed and reliability matter most
2. Google Flan-T5-XL
Best for: Maximum speed
LLM_MODEL=google/flan-t5-xl
Specs:
- Speed: β‘β‘β‘ Very Fast (3-10 seconds)
- Quality: ββ Decent
- Size: 3B parameters
- Context: 512 tokens
- Status: β Guaranteed deployed on HF Inference API
Pros:
- Fastest generation
- Always available
- Good for simple tasks
- Minimal latency
- Very lightweight
Cons:
- Lower quality outputs than XXL variant
- Limited context
- Shorter responses
- May struggle with complex tasks
Best for:
- Testing/prototyping
- Simple surveys (5-8 questions)
- Quick translations
- When you need instant results
3. Mistral-7B-Instruct-v0.2
Best for: Best quality output (if available)
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
Specs:
- Speed: β‘β‘ Medium (20-45 seconds)
- Quality: ββββ Excellent
- Size: 7B parameters
- Context: 8K tokens
- Status: β οΈ Deployment varies - may not be available
Pros:
- Excellent quality outputs
- Good reasoning capabilities
- Larger context window
- Handles complex tasks well
Cons:
- May not be deployed on Inference API
- Slower than Flan-T5 models
- May queue during peak times
- Can return 404 errors if not available
Best for:
- High-quality surveys (if available)
- Complex analysis tasks
- When quality matters most
Note: This model may not be consistently available on the free Serverless Inference API. Use Flan-T5-XXL for guaranteed availability.
4. Google Flan-UL2
Best for: Long contexts
LLM_MODEL=google/flan-ul2
Specs:
- Speed: β‘β‘ Fast (15-40 seconds)
- Quality: βββ Good
- Size: 20B parameters
- Context: 2K tokens
Pros:
- Better context handling
- Good quality
- Handles longer inputs
- Good for analysis
Cons:
- Slightly slower
- Can be unpredictable
- May timeout occasionally
Best for:
- Longer survey outlines
- Complex analysis tasks
- When you need more context
π Model Comparison
| Model | Speed | Quality | Size | Deployed | Best Use Case |
|---|---|---|---|---|---|
| Flan-T5-XXL β | β‘β‘β‘ Very Fast | βββ Good | 11B | β Guaranteed | Default - fast & reliable |
| Flan-T5-XL | β‘β‘β‘ Very Fast | ββ Decent | 3B | β Guaranteed | Maximum speed |
| Flan-UL2 | β‘β‘ Medium | βββ Good | 20B | β Guaranteed | Longer contexts |
| Mistral-7B | β‘β‘ Medium | ββββ Excellent | 7B | β οΈ Varies | Best quality (if available) |
Note: Only models with "β Guaranteed" are always available on HF Inference API. Models marked "β οΈ Varies" may not be deployed.
π― Use Case Recommendations
For Survey Generation:
5-10 questions (simple):
LLM_MODEL=google/flan-t5-xl # Fastest
10-15 questions (standard):
LLM_MODEL=google/flan-t5-xxl # Default, balanced
15+ questions (detailed):
LLM_MODEL=google/flan-ul2 # Better context handling
For Translation:
1-2 languages (quick):
LLM_MODEL=google/flan-t5-xl # Fastest translations
3-5 languages (standard):
LLM_MODEL=google/flan-t5-xxl # Default, reliable
5+ languages or critical translations:
LLM_MODEL=google/flan-ul2 # Better quality
For Data Analysis:
10-30 responses (simple):
LLM_MODEL=google/flan-t5-xl # Quick insights
30-100 responses (standard):
LLM_MODEL=google/flan-t5-xxl # Default, balanced
100+ responses or complex analysis:
LLM_MODEL=google/flan-ul2 # Deep analysis, better context
βοΈ How to Change Models
On HuggingFace Spaces:
- Go to your Space Settings
- Click "Variables" or "Repository secrets"
- Add new variable:
- Name:
LLM_MODEL - Value:
google/flan-t5-xxl(or any model above)
- Name:
- Restart your Space
Running Locally:
# Option 1: Environment variable
export LLM_MODEL=google/flan-t5-xxl
python app.py
# Option 2: In code (app.py)
import os
os.environ["LLM_MODEL"] = "google/flan-t5-xl"
In Docker:
ENV LLM_MODEL=google/flan-t5-xxl
π‘ Tips for Best Results
1. Start Simple
Begin with the default (Flan-T5-XXL) and only switch if you need to:
- Need maximum speed? β Try Flan-T5-XL
- Need longer context? β Try Flan-UL2
- Need best quality? β Try Mistral-7B (if available)
2. Adjust Your Prompts
Different models work better with different prompting:
Flan-T5 models (recommended):
- Prefer clear, direct instructions
- Work better with structured input
- Best with specific requirements
- Use imperative language ("Generate...", "Create...", "Translate...")
Mistral (if available):
- Can handle conversational outlines
- Good with context and examples
- Understands nuance
3. Manage Expectations
Free tier limitations:
- Cold start: 30-60 seconds on first request
- Queue times: 10-30 seconds during peak hours
- Rate limits: ~1 request every few seconds
- Timeouts: Possible on very complex tasks
Solutions:
- Be patient on first request
- Use off-peak hours when possible
- Keep prompts concise
- Try a faster model if timeouts occur
4. Test and Compare
Try generating the same survey with different models:
# Test 1: Flan-T5-XXL (default, balanced)
LLM_MODEL=google/flan-t5-xxl
# Test 2: Flan-T5-XL (faster)
LLM_MODEL=google/flan-t5-xl
# Test 3: Flan-UL2 (more context)
LLM_MODEL=google/flan-ul2
Pick the one that works best for your use case!
π Troubleshooting
"Model loading failed"
Cause: Model might be down or loading
Solutions:
- Wait 1-2 minutes and retry
- Try a different Flan-T5 variant (all are stable)
- Check HuggingFace status page
"Request timed out"
Cause: Model taking too long (can happen on first request)
Solutions:
- Retry - second request is faster
- Use a faster model (Flan-T5-XL)
- Simplify your prompt
- Try during off-peak hours
"Rate limit exceeded"
Cause: Too many requests too fast
Solutions:
- Wait 30-60 seconds between requests
- Use a Pro HuggingFace account (still free for inference)
- Deploy your own Space (gets its own quota)
Poor quality output
Cause: Model not suitable for task
Solutions:
- Try Mistral-7B for better quality
- Make prompts more specific
- Provide examples in your outline
- Break complex tasks into smaller steps
π Performance Benchmarks
Based on typical usage patterns:
| Task | Flan-T5-XL | Flan-T5-XXL | Flan-UL2 |
|---|---|---|---|
| Generate 10Q survey | 5-10s | 8-15s | 15-25s |
| Translate to 3 lang | 8-12s | 12-20s | 20-30s |
| Analyze 50 responses | 10-15s | 15-25s | 25-40s |
| First request (cold) | 10-20s | 15-30s | 30-45s |
| Subsequent requests | 3-8s | 5-12s | 10-20s |
Times are approximate and vary based on server load
π Advanced Tips
1. Model-Specific Prompting
For Flan-T5-XXL (Default):
Task: Create survey about mobile app satisfaction
Requirements:
- 10 questions
- Topics: usability, performance, features
- Audience: iOS users 25-45
Generate a professional survey following best practices.
For Flan-T5-XL (Fast):
Create 8 questions about mobile app satisfaction.
Topics: usability, performance, features.
Audience: iOS users 25-45.
For Flan-UL2 (More Context):
Generate a comprehensive survey to understand mobile app user satisfaction.
Context: We're a productivity app with 100K users. Recent reviews mention
performance issues and missing features. We need to understand:
1. Current satisfaction levels
2. Specific pain points
3. Feature priorities
Target: iOS users aged 25-45 who use the app daily.
Create 12-15 questions following qualitative research best practices.
2. Optimize for Speed
Fast survey generation:
- Use Flan-T5-XL
- Keep outline to 2-3 sentences
- Request 5-8 questions
- Use clear, direct prompts
Result: 3-8 second generation
3. Optimize for Quality
High-quality surveys:
- Use Flan-UL2
- Provide detailed context and examples
- Request 10-15 questions
- Include specific requirements
Result: Professional, well-structured surveys
β FAQ
Q: Why is Flan-T5-XXL the default? A: It's guaranteed to be deployed on HF Inference API, fast, and reliable. Google's instruction-tuned model works well for structured tasks.
Q: Can I use multiple models in one app?
A: Yes! Change LLM_MODEL environment variable to switch models.
Q: Which model is best for non-English? A: All Flan-T5 models support multiple languages. For best multilingual support, try Flan-UL2.
Q: Do these models cost money? A: No! All are free on HuggingFace Inference API.
Q: Can I use my own fine-tuned model?
A: Yes! Set LLM_MODEL to your model ID on HuggingFace.
Q: What if I need better performance? A: Consider:
- HuggingFace Pro (faster free tier)
- Deploy model yourself (Hugging Face Inference Endpoints)
- Use dedicated GPU
π Quick Start Commands
# Try Flan-T5-XXL (default, balanced)
LLM_MODEL=google/flan-t5-xxl python app.py
# Try Flan-T5-XL (fastest)
LLM_MODEL=google/flan-t5-xl python app.py
# Try Flan-UL2 (more context)
LLM_MODEL=google/flan-ul2 python app.py
# Check which model is active
python check_env.py
Updated: November 2025 All models tested and working on HuggingFace free tier
For more help, see TROUBLESHOOTING.md or USER_GUIDE.md