jmisak commited on
Commit
d6f9f11
Β·
verified Β·
1 Parent(s): 56fed0f

Upload 4 files

Browse files
Files changed (4) hide show
  1. CHANGELOG.md +18 -14
  2. README.md +17 -16
  3. llm_backend.py +5 -4
  4. survey_generator.py +5 -9
CHANGELOG.md CHANGED
@@ -10,9 +10,9 @@ All notable changes to ConversAI will be documented in this file.
10
  - **No API endpoint issues** - everything runs on your Space
11
  - **Faster after first load** - models cached in memory
12
  - **100% private** - all processing happens locally
13
- - Default model: **google/flan-t5-xl** (3GB, excellent quality)
14
- - Supports all Flan-T5 variants (base, large, xl, xxl)
15
- - **Important**: XL or larger required for quality results; smaller models produce poor output
16
 
17
  ### Added
18
  - **New dependencies**: transformers, torch, accelerate, sentencepiece
@@ -24,15 +24,18 @@ All notable changes to ConversAI will be documented in this file.
24
  - First request: ~1-2 minutes (download + load)
25
  - Subsequent requests: ~2-5 seconds
26
 
27
- - **Support for multiple Flan-T5 sizes**: Choose based on your needs
28
- - flan-t5-base: 250MB (fast, good quality)
29
- - flan-t5-large: 1.2GB (better quality)
30
- - flan-t5-xl: 3GB (excellent quality)
31
- - flan-t5-xxl: 11GB (best quality)
32
 
33
  ### Fixed
34
  - **No more 404 API errors** - eliminated all API endpoint issues
35
  - **No API token required** - works without any credentials on HF Spaces
 
 
 
36
  - Faster generation after initial model load
37
  - More reliable - no network dependencies
38
  - Better privacy - all processing local
@@ -44,12 +47,13 @@ All notable changes to ConversAI will be documented in this file.
44
  - Added model caching to keep models in memory
45
  - Auto-detects CUDA/CPU and optimizes accordingly
46
 
47
- - **Default model**: `google/flan-t5-xl` (line 84)
48
- - Changed from API-based to local transformers
49
- - 3GB model required for acceptable quality
50
- - Testing showed base/large models produce generic, irrelevant questions
51
- - XL provides good balance of quality and resource usage
52
- - User can upgrade to xxl or downgrade to large/base via LLM_MODEL env var (not recommended)
 
53
 
54
  - **Complete rewrite of survey generation** in `survey_generator.py`:
55
  - **Changed approach**: No longer asks model to generate JSON (T5 models struggle with structured output)
 
10
  - **No API endpoint issues** - everything runs on your Space
11
  - **Faster after first load** - models cached in memory
12
  - **100% private** - all processing happens locally
13
+ - Default model: **microsoft/phi-2** (2.7GB, causal language model)
14
+ - Supports causal LMs: Phi-2, Gemma-2B-IT, TinyLlama, Mistral-7B
15
+ - **Important**: Use causal LMs only; Flan-T5 models copy examples instead of generating
16
 
17
  ### Added
18
  - **New dependencies**: transformers, torch, accelerate, sentencepiece
 
24
  - First request: ~1-2 minutes (download + load)
25
  - Subsequent requests: ~2-5 seconds
26
 
27
+ - **Support for multiple causal LM sizes**: Choose based on your needs
28
+ - TinyLlama-1.1B-Chat: 1.1GB (fast, fair quality)
29
+ - Gemma-2B-IT: 2GB (good quality)
30
+ - Phi-2: 2.7GB (excellent quality, default)
31
+ - Mistral-7B-Instruct: 7GB (best quality)
32
 
33
  ### Fixed
34
  - **No more 404 API errors** - eliminated all API endpoint issues
35
  - **No API token required** - works without any credentials on HF Spaces
36
+ - **Fixed generic/example-copying questions** - switched from Flan-T5 to Phi-2
37
+ - Flan-T5 models would generate "How would you rate X/Y/Z?" by copying examples
38
+ - Phi-2 (causal LM) generates actual contextual questions about the topic
39
  - Faster generation after initial model load
40
  - More reliable - no network dependencies
41
  - Better privacy - all processing local
 
47
  - Added model caching to keep models in memory
48
  - Auto-detects CUDA/CPU and optimizes accordingly
49
 
50
+ - **Default model**: `microsoft/phi-2` (line 85)
51
+ - Changed from API-based to local causal language model
52
+ - **Critical fix**: Switched from Flan-T5 to Phi-2 (causal LM)
53
+ - **Problem identified**: Flan-T5 (encoder-decoder) copies examples instead of generating contextual questions
54
+ - **Solution**: Phi-2 (decoder-only) is designed for creative text generation
55
+ - 2.7GB model provides excellent contextual question generation
56
+ - User can try Gemma-2B-IT, TinyLlama, or Mistral-7B via LLM_MODEL env var
57
 
58
  - **Complete rewrite of survey generation** in `survey_generator.py`:
59
  - **Changed approach**: No longer asks model to generate JSON (T5 models struggle with structured output)
README.md CHANGED
@@ -16,7 +16,7 @@ Battle the blank page, reach global audiences, and uncover insights with AI assi
16
 
17
  ---
18
 
19
- > **✨ UPDATED (Nov 2025):** Now uses **local transformers** with **Google Flan-T5** models - Fast, reliable, and **completely FREE**! No API dependencies, runs directly on HuggingFace Spaces.
20
 
21
  ---
22
 
@@ -57,13 +57,13 @@ Battle the blank page, reach global audiences, and uncover insights with AI assi
57
 
58
  **✨ Zero configuration needed!** ConversAI works out-of-the-box on HuggingFace Spaces using local model loading.
59
 
60
- **Default Model:** google/flan-t5-xl
61
  - βœ… **100% Free** - No API keys, no costs, ever
62
- - βœ… **High quality** - 3GB model, excellent at following complex instructions
63
  - βœ… **Good speed** - Typically 5-10 seconds per request after initial load
64
  - βœ… **No API dependencies** - Runs entirely on your Space's compute
65
  - βœ… **Private** - All processing happens locally, nothing sent to external APIs
66
- - βœ… **Reliable** - Google's instruction-tuned model, battle-tested
67
 
68
  **Setup for HuggingFace Spaces:**
69
  - Just deploy - models download automatically on first run
@@ -78,20 +78,20 @@ You can try different free models by setting the `LLM_MODEL` environment variabl
78
 
79
  | Model | Best For | Speed | Quality | Model Size |
80
  |-------|----------|-------|---------|------------|
81
- | **google/flan-t5-base** | Quick testing only | ⚑⚑⚑ Very Fast | ⭐ Poor | 250MB |
82
- | **google/flan-t5-large** | Faster loading | ⚑⚑ Fast | ⭐⭐ Fair | 1.2GB |
83
- | **google/flan-t5-xl** (default) | **Recommended** - best balance | ⚑ Good | ⭐⭐⭐⭐ Excellent | 3GB |
84
- | **google/flan-t5-xxl** | Maximum quality | ⚑ Slower | ⭐⭐⭐⭐⭐ Best | 11GB |
85
 
86
- **Note:** Flan-T5 models are Google's instruction-tuned models, specifically designed for following instructions. They run locally with transformers library.
87
 
88
  **To change model:**
89
  ```bash
90
  # In Space Settings β†’ Variables
91
- LLM_MODEL=google/flan-t5-large # Better quality
92
 
93
  # Or for maximum quality (requires more memory)
94
- LLM_MODEL=google/flan-t5-xl
95
  ```
96
 
97
  **Why Local Transformers?**
@@ -103,12 +103,13 @@ LLM_MODEL=google/flan-t5-xl
103
 
104
  ### Tips for Best Performance with Local Models
105
 
106
- 1. **Use flan-t5-xl (default)** - XL provides good quality, smaller models produce poor results
107
- 2. **First load takes time** - Model downloads and loads (~3-5 minutes for XL)
108
  3. **Subsequent requests are fast** - Model stays in memory (5-10 seconds)
109
- 4. **For maximum quality** - Use flan-t5-xxl (requires 16GB+ RAM)
110
- 5. **Avoid smaller models** - Base and Large often produce generic or irrelevant questions
111
- 6. **Be specific in outlines** - More detail helps model generate better questions
 
112
 
113
  ## πŸ“¦ Installation
114
 
 
16
 
17
  ---
18
 
19
+ > **✨ UPDATED (Nov 2025):** Now uses **local transformers** with **Microsoft Phi-2** - Fast, contextual, and **completely FREE**! No API dependencies, runs directly on HuggingFace Spaces. Generates actual topic-specific questions (not generic templates).
20
 
21
  ---
22
 
 
57
 
58
  **✨ Zero configuration needed!** ConversAI works out-of-the-box on HuggingFace Spaces using local model loading.
59
 
60
+ **Default Model:** microsoft/phi-2
61
  - βœ… **100% Free** - No API keys, no costs, ever
62
+ - βœ… **Excellent quality** - 2.7GB causal language model, great at creative text generation
63
  - βœ… **Good speed** - Typically 5-10 seconds per request after initial load
64
  - βœ… **No API dependencies** - Runs entirely on your Space's compute
65
  - βœ… **Private** - All processing happens locally, nothing sent to external APIs
66
+ - βœ… **Contextual** - Generates relevant, topic-specific questions (not generic)
67
 
68
  **Setup for HuggingFace Spaces:**
69
  - Just deploy - models download automatically on first run
 
78
 
79
  | Model | Best For | Speed | Quality | Model Size |
80
  |-------|----------|-------|---------|------------|
81
+ | **TinyLlama/TinyLlama-1.1B-Chat-v1.0** | Quick testing | ⚑⚑⚑ Very Fast | ⭐⭐ Fair | 1.1GB |
82
+ | **google/gemma-2b-it** | Faster alternative | ⚑⚑ Fast | ⭐⭐⭐ Good | 2GB |
83
+ | **microsoft/phi-2** (default) | **Recommended** - best balance | ⚑ Good | ⭐⭐⭐⭐ Excellent | 2.7GB |
84
+ | **mistralai/Mistral-7B-Instruct-v0.2** | Maximum quality | ⚑ Slower | ⭐⭐⭐⭐⭐ Best | 7GB |
85
 
86
+ **Note:** These are causal language models (decoder-only) designed for text generation. **Do NOT use Flan-T5 models** - they copy examples instead of generating contextual questions.
87
 
88
  **To change model:**
89
  ```bash
90
  # In Space Settings β†’ Variables
91
+ LLM_MODEL=google/gemma-2b-it # Faster alternative
92
 
93
  # Or for maximum quality (requires more memory)
94
+ LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
95
  ```
96
 
97
  **Why Local Transformers?**
 
103
 
104
  ### Tips for Best Performance with Local Models
105
 
106
+ 1. **Use Phi-2 (default)** - Best balance of quality and resource usage
107
+ 2. **First load takes time** - Model downloads and loads (~2-3 minutes for Phi-2)
108
  3. **Subsequent requests are fast** - Model stays in memory (5-10 seconds)
109
+ 4. **For maximum quality** - Use Mistral-7B-Instruct (requires 8GB+ RAM)
110
+ 5. **For faster loading** - Use Gemma-2B-IT or TinyLlama (good quality, smaller)
111
+ 6. **Avoid Flan-T5 models** - They copy examples instead of generating contextual questions
112
+ 7. **Be specific in outlines** - More detail helps model generate better questions
113
 
114
  ## πŸ“¦ Installation
115
 
llm_backend.py CHANGED
@@ -78,10 +78,11 @@ class LLMBackend:
78
  defaults = {
79
  LLMProvider.OPENAI: "gpt-4o-mini",
80
  LLMProvider.ANTHROPIC: "claude-3-5-sonnet-20241022",
81
- # Using Flan-T5-XL - best balance for quality survey generation (3GB)
82
- # For faster loading: google/flan-t5-large (1.2GB) - may have lower quality
83
- # For maximum quality: google/flan-t5-xxl (11GB) - requires more memory
84
- LLMProvider.HUGGINGFACE: "google/flan-t5-xl",
 
85
  LLMProvider.LM_STUDIO: "google/gemma-3-27b"
86
  }
87
  return os.getenv("LLM_MODEL", defaults[self.provider])
 
78
  defaults = {
79
  LLMProvider.OPENAI: "gpt-4o-mini",
80
  LLMProvider.ANTHROPIC: "claude-3-5-sonnet-20241022",
81
+ # Using Microsoft Phi-2 - causal LM, better at creative text generation
82
+ # Phi-2 is 2.7GB and instruction-tuned for following prompts
83
+ # Alternative: google/gemma-2b-it (2GB) or TinyLlama/TinyLlama-1.1B-Chat-v1.0 (1.1GB)
84
+ # NOTE: Flan-T5 models don't work well - they copy examples instead of generating
85
+ LLMProvider.HUGGINGFACE: "microsoft/phi-2",
86
  LLMProvider.LM_STUDIO: "google/gemma-3-27b"
87
  }
88
  return os.getenv("LLM_MODEL", defaults[self.provider])
survey_generator.py CHANGED
@@ -83,20 +83,16 @@ class SurveyGenerator:
83
 
84
  def _build_generation_prompt(self, outline, survey_type, num_questions, target_audience) -> str:
85
  """Build the user prompt for survey generation"""
86
- # For T5 models, be very specific and direct
87
- return f"""Create {num_questions} professional survey questions.
88
 
89
  Topic: {outline}
90
- Audience: {target_audience}
91
 
92
- Write {num_questions} questions numbered 1-{num_questions}. Each question must be specific to the topic above.
93
 
94
- Examples:
95
- 1. What is your experience with X?
96
- 2. How would you rate Y?
97
- 3. What challenges do you face with Z?
98
 
99
- Your {num_questions} questions:
100
  1."""
101
 
102
  def _parse_survey_response(self, response: str) -> Dict:
 
83
 
84
  def _build_generation_prompt(self, outline, survey_type, num_questions, target_audience) -> str:
85
  """Build the user prompt for survey generation"""
86
+ # For causal LMs (Phi, Gemma, etc.) - more conversational
87
+ return f"""You are designing a {survey_type} survey for: {target_audience}
88
 
89
  Topic: {outline}
 
90
 
91
+ Generate {num_questions} specific, relevant survey questions about this topic. Make each question clear and professional.
92
 
93
+ Write your questions in a numbered list format (1., 2., 3., etc.). Focus on the specific topic and audience mentioned above.
 
 
 
94
 
95
+ Questions:
96
  1."""
97
 
98
  def _parse_survey_response(self, response: str) -> Dict: