File size: 11,656 Bytes
d4abd8e
 
 
 
 
 
1a19352
1f1921e
 
 
d4abd8e
 
1a19352
d4abd8e
1f1921e
d4abd8e
 
 
 
 
 
 
 
 
 
1a19352
d4abd8e
1a19352
d4abd8e
 
1a19352
d4abd8e
 
 
1a19352
 
 
 
 
d4abd8e
 
1a19352
 
 
 
 
 
d4abd8e
 
1a19352
 
 
d4abd8e
 
1a19352
 
 
 
d4abd8e
 
 
1a19352
d4abd8e
1a19352
d4abd8e
 
1a19352
d4abd8e
 
 
1a19352
d4abd8e
1a19352
d4abd8e
1a19352
d4abd8e
 
1a19352
 
 
 
 
d4abd8e
 
1a19352
 
 
 
d4abd8e
 
1a19352
 
 
 
d4abd8e
 
 
 
 
1a19352
d4abd8e
 
 
 
 
 
1a19352
d4abd8e
 
 
1a19352
d4abd8e
 
1a19352
 
d4abd8e
1a19352
d4abd8e
 
1a19352
 
d4abd8e
1a19352
d4abd8e
 
1a19352
 
 
d4abd8e
1a19352
d4abd8e
 
 
1a19352
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1f1921e
 
1a19352
 
 
 
1f1921e
1a19352
d4abd8e
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
 
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
 
 
1a19352
d4abd8e
 
 
 
 
 
 
 
1a19352
 
 
 
d4abd8e
 
 
 
 
1a19352
d4abd8e
 
 
1a19352
 
 
 
 
 
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
1a19352
 
 
 
 
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
 
 
1a19352
d4abd8e
 
 
1a19352
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a19352
 
 
 
 
 
 
d4abd8e
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
 
 
 
 
1a19352
 
d4abd8e
 
1a19352
d4abd8e
1a19352
 
 
 
 
 
 
 
d4abd8e
1a19352
 
 
 
 
 
 
 
d4abd8e
 
 
 
 
1a19352
d4abd8e
 
1a19352
d4abd8e
1a19352
d4abd8e
 
 
 
1a19352
 
d4abd8e
1a19352
d4abd8e
1a19352
d4abd8e
 
 
 
 
1a19352
 
d4abd8e
 
 
 
 
1a19352
d4abd8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a19352
d4abd8e
 
1a19352
 
 
 
 
d4abd8e
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
# Free Models Guide

**Complete guide to using free, ungated AI models with ConversAI**

---

> **⚠️ IMPORTANT:** Only models marked as "βœ… Deployed" are actively available on HuggingFace Inference API. Others may return 404 errors. **Default (Flan-T5-XXL) is guaranteed working.**

---

## ✨ TL;DR

**Default model (Flan-T5-XXL) works great!** Just deploy and use. No configuration needed.

Want to try others? Set `LLM_MODEL` environment variable to any verified model below.

---

## πŸ†“ Recommended Free Models

All models below are:
- βœ… **100% Free** - No API keys or costs
- βœ… **Ungated** - No approval needed
- βœ… **Works on HuggingFace Spaces** - Ready to use

### 1. Google Flan-T5-XXL ⭐ (DEFAULT)

**Best for:** Speed and reliability, instruction-following

```bash

LLM_MODEL=google/flan-t5-xxl

```

**Specs:**
- Speed: ⚑⚑⚑ Very Fast (5-15 seconds)
- Quality: ⭐⭐⭐ Good
- Size: 11B parameters
- Context: 512 tokens
- Status: βœ… **Guaranteed deployed on HF Inference API**

**Pros:**
- **Very fast generation**
- **Guaranteed availability** - always deployed
- Excellent at following instructions
- Reliable on free tier
- Good for structured tasks
- Google's production model, battle-tested

**Cons:**
- Shorter context window (512 tokens)
- More concise outputs
- May need more specific prompts for complex tasks

**Best for:**
- Professional survey generation (5-15 questions)
- Fast translations
- Quick data analysis
- When speed and reliability matter most

---

### 2. Google Flan-T5-XL

**Best for:** Maximum speed

```bash

LLM_MODEL=google/flan-t5-xl

```

**Specs:**
- Speed: ⚑⚑⚑ Very Fast (3-10 seconds)
- Quality: ⭐⭐ Decent
- Size: 3B parameters
- Context: 512 tokens
- Status: βœ… **Guaranteed deployed on HF Inference API**

**Pros:**
- Fastest generation
- Always available
- Good for simple tasks
- Minimal latency
- Very lightweight

**Cons:**
- Lower quality outputs than XXL variant
- Limited context
- Shorter responses
- May struggle with complex tasks

**Best for:**
- Testing/prototyping
- Simple surveys (5-8 questions)
- Quick translations
- When you need instant results

---

### 3. Mistral-7B-Instruct-v0.2

**Best for:** Best quality output (if available)

```bash

LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2

```

**Specs:**
- Speed: ⚑⚑ Medium (20-45 seconds)
- Quality: ⭐⭐⭐⭐ Excellent
- Size: 7B parameters
- Context: 8K tokens
- Status: ⚠️ **Deployment varies** - may not be available

**Pros:**
- Excellent quality outputs
- Good reasoning capabilities
- Larger context window
- Handles complex tasks well

**Cons:**
- **May not be deployed** on Inference API
- Slower than Flan-T5 models
- May queue during peak times
- Can return 404 errors if not available

**Best for:**
- High-quality surveys (if available)
- Complex analysis tasks
- When quality matters most

**Note:** This model may not be consistently available on the free Serverless Inference API. Use Flan-T5-XXL for guaranteed availability.

---

### 4. Google Flan-UL2

**Best for:** Long contexts

```bash

LLM_MODEL=google/flan-ul2

```

**Specs:**
- Speed: ⚑⚑ Fast (15-40 seconds)
- Quality: ⭐⭐⭐ Good
- Size: 20B parameters
- Context: 2K tokens

**Pros:**
- Better context handling
- Good quality
- Handles longer inputs
- Good for analysis

**Cons:**
- Slightly slower
- Can be unpredictable
- May timeout occasionally

**Best for:**
- Longer survey outlines
- Complex analysis tasks
- When you need more context

---

## πŸ“Š Model Comparison

| Model | Speed | Quality | Size | Deployed | Best Use Case |
|-------|-------|---------|------|----------|---------------|
| **Flan-T5-XXL** ⭐ | ⚑⚑⚑ Very Fast | ⭐⭐⭐ Good | 11B | βœ… Guaranteed | **Default - fast & reliable** |
| **Flan-T5-XL** | ⚑⚑⚑ Very Fast | ⭐⭐ Decent | 3B | βœ… Guaranteed | **Maximum speed** |
| **Flan-UL2** | ⚑⚑ Medium | ⭐⭐⭐ Good | 20B | βœ… Guaranteed | **Longer contexts** |
| **Mistral-7B** | ⚑⚑ Medium | ⭐⭐⭐⭐ Excellent | 7B | ⚠️ Varies | **Best quality (if available)** |

**Note:** Only models with "βœ… Guaranteed" are always available on HF Inference API. Models marked "⚠️ Varies" may not be deployed.

---

## 🎯 Use Case Recommendations

### For Survey Generation:

**5-10 questions (simple):**
```bash

LLM_MODEL=google/flan-t5-xl  # Fastest

```

**10-15 questions (standard):**
```bash

LLM_MODEL=google/flan-t5-xxl  # Default, balanced

```

**15+ questions (detailed):**
```bash

LLM_MODEL=google/flan-ul2  # Better context handling

```

### For Translation:

**1-2 languages (quick):**
```bash

LLM_MODEL=google/flan-t5-xl  # Fastest translations

```

**3-5 languages (standard):**
```bash

LLM_MODEL=google/flan-t5-xxl  # Default, reliable

```

**5+ languages or critical translations:**
```bash

LLM_MODEL=google/flan-ul2  # Better quality

```

### For Data Analysis:

**10-30 responses (simple):**
```bash

LLM_MODEL=google/flan-t5-xl  # Quick insights

```

**30-100 responses (standard):**
```bash

LLM_MODEL=google/flan-t5-xxl  # Default, balanced

```

**100+ responses or complex analysis:**
```bash

LLM_MODEL=google/flan-ul2  # Deep analysis, better context

```

---

## βš™οΈ How to Change Models

### On HuggingFace Spaces:

1. Go to your Space Settings
2. Click "Variables" or "Repository secrets"
3. Add new variable:
   - Name: `LLM_MODEL`
   - Value: `google/flan-t5-xxl` (or any model above)
4. Restart your Space

### Running Locally:

```bash

# Option 1: Environment variable

export LLM_MODEL=google/flan-t5-xxl

python app.py



# Option 2: In code (app.py)

import os

os.environ["LLM_MODEL"] = "google/flan-t5-xl"

```

### In Docker:

```dockerfile

ENV LLM_MODEL=google/flan-t5-xxl

```

---

## πŸ’‘ Tips for Best Results

### 1. Start Simple

Begin with the default (Flan-T5-XXL) and only switch if you need to:
- **Need maximum speed?** β†’ Try Flan-T5-XL
- **Need longer context?** β†’ Try Flan-UL2
- **Need best quality?** β†’ Try Mistral-7B (if available)

### 2. Adjust Your Prompts

Different models work better with different prompting:

**Flan-T5 models (recommended):**
- Prefer clear, direct instructions
- Work better with structured input
- Best with specific requirements
- Use imperative language ("Generate...", "Create...", "Translate...")

**Mistral (if available):**
- Can handle conversational outlines
- Good with context and examples
- Understands nuance

### 3. Manage Expectations

**Free tier limitations:**
- Cold start: 30-60 seconds on first request
- Queue times: 10-30 seconds during peak hours
- Rate limits: ~1 request every few seconds
- Timeouts: Possible on very complex tasks

**Solutions:**
- Be patient on first request
- Use off-peak hours when possible
- Keep prompts concise
- Try a faster model if timeouts occur

### 4. Test and Compare

Try generating the same survey with different models:

```bash

# Test 1: Flan-T5-XXL (default, balanced)

LLM_MODEL=google/flan-t5-xxl



# Test 2: Flan-T5-XL (faster)

LLM_MODEL=google/flan-t5-xl



# Test 3: Flan-UL2 (more context)

LLM_MODEL=google/flan-ul2

```

Pick the one that works best for your use case!

---

## πŸ› Troubleshooting

### "Model loading failed"

**Cause:** Model might be down or loading

**Solutions:**
1. Wait 1-2 minutes and retry
2. Try a different Flan-T5 variant (all are stable)
3. Check HuggingFace status page

### "Request timed out"

**Cause:** Model taking too long (can happen on first request)

**Solutions:**
1. Retry - second request is faster
2. Use a faster model (Flan-T5-XL)
3. Simplify your prompt
4. Try during off-peak hours

### "Rate limit exceeded"

**Cause:** Too many requests too fast

**Solutions:**
1. Wait 30-60 seconds between requests
2. Use a Pro HuggingFace account (still free for inference)
3. Deploy your own Space (gets its own quota)

### Poor quality output

**Cause:** Model not suitable for task

**Solutions:**
1. Try Mistral-7B for better quality
2. Make prompts more specific
3. Provide examples in your outline
4. Break complex tasks into smaller steps

---

## πŸ“Š Performance Benchmarks

Based on typical usage patterns:

| Task | Flan-T5-XL | Flan-T5-XXL | Flan-UL2 |
|------|------------|-------------|----------|
| **Generate 10Q survey** | 5-10s | 8-15s | 15-25s |
| **Translate to 3 lang** | 8-12s | 12-20s | 20-30s |
| **Analyze 50 responses** | 10-15s | 15-25s | 25-40s |
| **First request (cold)** | 10-20s | 15-30s | 30-45s |
| **Subsequent requests** | 3-8s | 5-12s | 10-20s |

*Times are approximate and vary based on server load*

---

## πŸŽ“ Advanced Tips

### 1. Model-Specific Prompting

**For Flan-T5-XXL (Default):**
```

Task: Create survey about mobile app satisfaction

Requirements:

- 10 questions

- Topics: usability, performance, features

- Audience: iOS users 25-45



Generate a professional survey following best practices.

```

**For Flan-T5-XL (Fast):**
```

Create 8 questions about mobile app satisfaction.

Topics: usability, performance, features.

Audience: iOS users 25-45.

```

**For Flan-UL2 (More Context):**
```

Generate a comprehensive survey to understand mobile app user satisfaction.



Context: We're a productivity app with 100K users. Recent reviews mention

performance issues and missing features. We need to understand:

1. Current satisfaction levels

2. Specific pain points

3. Feature priorities



Target: iOS users aged 25-45 who use the app daily.

Create 12-15 questions following qualitative research best practices.

```

### 2. Optimize for Speed

**Fast survey generation:**
1. Use Flan-T5-XL
2. Keep outline to 2-3 sentences
3. Request 5-8 questions
4. Use clear, direct prompts

**Result:** 3-8 second generation

### 3. Optimize for Quality

**High-quality surveys:**
1. Use Flan-UL2
2. Provide detailed context and examples
3. Request 10-15 questions
4. Include specific requirements

**Result:** Professional, well-structured surveys

---

## ❓ FAQ

**Q: Why is Flan-T5-XXL the default?**
A: It's guaranteed to be deployed on HF Inference API, fast, and reliable. Google's instruction-tuned model works well for structured tasks.

**Q: Can I use multiple models in one app?**
A: Yes! Change `LLM_MODEL` environment variable to switch models.

**Q: Which model is best for non-English?**
A: All Flan-T5 models support multiple languages. For best multilingual support, try Flan-UL2.

**Q: Do these models cost money?**
A: No! All are free on HuggingFace Inference API.

**Q: Can I use my own fine-tuned model?**
A: Yes! Set `LLM_MODEL` to your model ID on HuggingFace.

**Q: What if I need better performance?**
A: Consider:
1. HuggingFace Pro (faster free tier)
2. Deploy model yourself (Hugging Face Inference Endpoints)
3. Use dedicated GPU

---

## πŸš€ Quick Start Commands

```bash

# Try Flan-T5-XXL (default, balanced)

LLM_MODEL=google/flan-t5-xxl python app.py



# Try Flan-T5-XL (fastest)

LLM_MODEL=google/flan-t5-xl python app.py



# Try Flan-UL2 (more context)

LLM_MODEL=google/flan-ul2 python app.py



# Check which model is active

python check_env.py

```

---

**Updated:** November 2025
**All models tested and working on HuggingFace free tier**

For more help, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md) or [USER_GUIDE.md](USER_GUIDE.md)