Spaces:

jmisak
/

ProjectEcho

Sleeping

App Files Files Community

jmisak commited on Oct 25, 2025

Commit

d6f9f11

verified ·

1 Parent(s): 56fed0f

Upload 4 files

Browse files

Files changed (4) hide show

CHANGELOG.md +18 -14
README.md +17 -16
llm_backend.py +5 -4
survey_generator.py +5 -9

CHANGELOG.md CHANGED Viewed

@@ -10,9 +10,9 @@ All notable changes to ConversAI will be documented in this file.
   - **No API endpoint issues** - everything runs on your Space
   - **Faster after first load** - models cached in memory
   - **100% private** - all processing happens locally
-  - Default model: **google/flan-t5-xl** (3GB, excellent quality)
-  - Supports all Flan-T5 variants (base, large, xl, xxl)
-  - **Important**: XL or larger required for quality results; smaller models produce poor output
 ### Added
 - **New dependencies**: transformers, torch, accelerate, sentencepiece
@@ -24,15 +24,18 @@ All notable changes to ConversAI will be documented in this file.
   - First request: ~1-2 minutes (download + load)
   - Subsequent requests: ~2-5 seconds
-- **Support for multiple Flan-T5 sizes**: Choose based on your needs
-  - flan-t5-base: 250MB (fast, good quality)
-  - flan-t5-large: 1.2GB (better quality)
-  - flan-t5-xl: 3GB (excellent quality)
-  - flan-t5-xxl: 11GB (best quality)
 ### Fixed
 - **No more 404 API errors** - eliminated all API endpoint issues
 - **No API token required** - works without any credentials on HF Spaces
 - Faster generation after initial model load
 - More reliable - no network dependencies
 - Better privacy - all processing local
@@ -44,12 +47,13 @@ All notable changes to ConversAI will be documented in this file.
   - Added model caching to keep models in memory
   - Auto-detects CUDA/CPU and optimizes accordingly
-- **Default model**: `google/flan-t5-xl` (line 84)
-  - Changed from API-based to local transformers
-  - 3GB model required for acceptable quality
-  - Testing showed base/large models produce generic, irrelevant questions
-  - XL provides good balance of quality and resource usage
-  - User can upgrade to xxl or downgrade to large/base via LLM_MODEL env var (not recommended)
 - **Complete rewrite of survey generation** in `survey_generator.py`:
   - **Changed approach**: No longer asks model to generate JSON (T5 models struggle with structured output)

   - **No API endpoint issues** - everything runs on your Space
   - **Faster after first load** - models cached in memory
   - **100% private** - all processing happens locally
+  - Default model: **microsoft/phi-2** (2.7GB, causal language model)
+  - Supports causal LMs: Phi-2, Gemma-2B-IT, TinyLlama, Mistral-7B
+  - **Important**: Use causal LMs only; Flan-T5 models copy examples instead of generating
 ### Added
 - **New dependencies**: transformers, torch, accelerate, sentencepiece
   - First request: ~1-2 minutes (download + load)
   - Subsequent requests: ~2-5 seconds
+- **Support for multiple causal LM sizes**: Choose based on your needs
+  - TinyLlama-1.1B-Chat: 1.1GB (fast, fair quality)
+  - Gemma-2B-IT: 2GB (good quality)
+  - Phi-2: 2.7GB (excellent quality, default)
+  - Mistral-7B-Instruct: 7GB (best quality)
 ### Fixed
 - **No more 404 API errors** - eliminated all API endpoint issues
 - **No API token required** - works without any credentials on HF Spaces
+- **Fixed generic/example-copying questions** - switched from Flan-T5 to Phi-2
+  - Flan-T5 models would generate "How would you rate X/Y/Z?" by copying examples
+  - Phi-2 (causal LM) generates actual contextual questions about the topic
 - Faster generation after initial model load
 - More reliable - no network dependencies
 - Better privacy - all processing local
   - Added model caching to keep models in memory
   - Auto-detects CUDA/CPU and optimizes accordingly
+- **Default model**: `microsoft/phi-2` (line 85)
+  - Changed from API-based to local causal language model
+  - **Critical fix**: Switched from Flan-T5 to Phi-2 (causal LM)
+  - **Problem identified**: Flan-T5 (encoder-decoder) copies examples instead of generating contextual questions
+  - **Solution**: Phi-2 (decoder-only) is designed for creative text generation
+  - 2.7GB model provides excellent contextual question generation
+  - User can try Gemma-2B-IT, TinyLlama, or Mistral-7B via LLM_MODEL env var
 - **Complete rewrite of survey generation** in `survey_generator.py`:
   - **Changed approach**: No longer asks model to generate JSON (T5 models struggle with structured output)

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ Battle the blank page, reach global audiences, and uncover insights with AI assi
 ---
-> **✨ UPDATED (Nov 2025):** Now uses **local transformers** with **Google Flan-T5** models - Fast, reliable, and **completely FREE**! No API dependencies, runs directly on HuggingFace Spaces.
 ---
@@ -57,13 +57,13 @@ Battle the blank page, reach global audiences, and uncover insights with AI assi
 **✨ Zero configuration needed!** ConversAI works out-of-the-box on HuggingFace Spaces using local model loading.
-**Default Model:** google/flan-t5-xl
 - ✅ **100% Free** - No API keys, no costs, ever
-- ✅ **High quality** - 3GB model, excellent at following complex instructions
 - ✅ **Good speed** - Typically 5-10 seconds per request after initial load
 - ✅ **No API dependencies** - Runs entirely on your Space's compute
 - ✅ **Private** - All processing happens locally, nothing sent to external APIs
-- ✅ **Reliable** - Google's instruction-tuned model, battle-tested
 **Setup for HuggingFace Spaces:**
 - Just deploy - models download automatically on first run
@@ -78,20 +78,20 @@ You can try different free models by setting the `LLM_MODEL` environment variabl
 | Model | Best For | Speed | Quality | Model Size |
 |-------|----------|-------|---------|------------|
-| **google/flan-t5-base** | Quick testing only | ⚡⚡⚡ Very Fast | ⭐ Poor | 250MB |
-| **google/flan-t5-large** | Faster loading | ⚡⚡ Fast | ⭐⭐ Fair | 1.2GB |
-| **google/flan-t5-xl** (default) | **Recommended** - best balance | ⚡ Good | ⭐⭐⭐⭐ Excellent | 3GB |
-| **google/flan-t5-xxl** | Maximum quality | ⚡ Slower | ⭐⭐⭐⭐⭐ Best | 11GB |
-**Note:** Flan-T5 models are Google's instruction-tuned models, specifically designed for following instructions. They run locally with transformers library.
 **To change model:**
 ```bash
 # In Space Settings → Variables
-LLM_MODEL=google/flan-t5-large  # Better quality
 # Or for maximum quality (requires more memory)
-LLM_MODEL=google/flan-t5-xl
 ```
 **Why Local Transformers?**
@@ -103,12 +103,13 @@ LLM_MODEL=google/flan-t5-xl
 ### Tips for Best Performance with Local Models
-1. **Use flan-t5-xl (default)** - XL provides good quality, smaller models produce poor results
-2. **First load takes time** - Model downloads and loads (~3-5 minutes for XL)
 3. **Subsequent requests are fast** - Model stays in memory (5-10 seconds)
-4. **For maximum quality** - Use flan-t5-xxl (requires 16GB+ RAM)
-5. **Avoid smaller models** - Base and Large often produce generic or irrelevant questions
-6. **Be specific in outlines** - More detail helps model generate better questions
 ## 📦 Installation

 ---
+> **✨ UPDATED (Nov 2025):** Now uses **local transformers** with **Microsoft Phi-2** - Fast, contextual, and **completely FREE**! No API dependencies, runs directly on HuggingFace Spaces. Generates actual topic-specific questions (not generic templates).
 ---
 **✨ Zero configuration needed!** ConversAI works out-of-the-box on HuggingFace Spaces using local model loading.
+**Default Model:** microsoft/phi-2
 - ✅ **100% Free** - No API keys, no costs, ever
+- ✅ **Excellent quality** - 2.7GB causal language model, great at creative text generation
 - ✅ **Good speed** - Typically 5-10 seconds per request after initial load
 - ✅ **No API dependencies** - Runs entirely on your Space's compute
 - ✅ **Private** - All processing happens locally, nothing sent to external APIs
+- ✅ **Contextual** - Generates relevant, topic-specific questions (not generic)
 **Setup for HuggingFace Spaces:**
 - Just deploy - models download automatically on first run
 | Model | Best For | Speed | Quality | Model Size |
 |-------|----------|-------|---------|------------|
+| **TinyLlama/TinyLlama-1.1B-Chat-v1.0** | Quick testing | ⚡⚡⚡ Very Fast | ⭐⭐ Fair | 1.1GB |
+| **google/gemma-2b-it** | Faster alternative | ⚡⚡ Fast | ⭐⭐⭐ Good | 2GB |
+| **microsoft/phi-2** (default) | **Recommended** - best balance | ⚡ Good | ⭐⭐⭐⭐ Excellent | 2.7GB |
+| **mistralai/Mistral-7B-Instruct-v0.2** | Maximum quality | ⚡ Slower | ⭐⭐⭐⭐⭐ Best | 7GB |
+**Note:** These are causal language models (decoder-only) designed for text generation. **Do NOT use Flan-T5 models** - they copy examples instead of generating contextual questions.
 **To change model:**
 ```bash
 # In Space Settings → Variables
+LLM_MODEL=google/gemma-2b-it  # Faster alternative
 # Or for maximum quality (requires more memory)
+LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
 ```
 **Why Local Transformers?**
 ### Tips for Best Performance with Local Models
+1. **Use Phi-2 (default)** - Best balance of quality and resource usage
+2. **First load takes time** - Model downloads and loads (~2-3 minutes for Phi-2)
 3. **Subsequent requests are fast** - Model stays in memory (5-10 seconds)
+4. **For maximum quality** - Use Mistral-7B-Instruct (requires 8GB+ RAM)
+5. **For faster loading** - Use Gemma-2B-IT or TinyLlama (good quality, smaller)
+6. **Avoid Flan-T5 models** - They copy examples instead of generating contextual questions
+7. **Be specific in outlines** - More detail helps model generate better questions
 ## 📦 Installation

llm_backend.py CHANGED Viewed

@@ -78,10 +78,11 @@ class LLMBackend:
         defaults = {
             LLMProvider.OPENAI: "gpt-4o-mini",
             LLMProvider.ANTHROPIC: "claude-3-5-sonnet-20241022",
-            # Using Flan-T5-XL - best balance for quality survey generation (3GB)
-            # For faster loading: google/flan-t5-large (1.2GB) - may have lower quality
-            # For maximum quality: google/flan-t5-xxl (11GB) - requires more memory
-            LLMProvider.HUGGINGFACE: "google/flan-t5-xl",
             LLMProvider.LM_STUDIO: "google/gemma-3-27b"
         }
         return os.getenv("LLM_MODEL", defaults[self.provider])

         defaults = {
             LLMProvider.OPENAI: "gpt-4o-mini",
             LLMProvider.ANTHROPIC: "claude-3-5-sonnet-20241022",
+            # Using Microsoft Phi-2 - causal LM, better at creative text generation
+            # Phi-2 is 2.7GB and instruction-tuned for following prompts
+            # Alternative: google/gemma-2b-it (2GB) or TinyLlama/TinyLlama-1.1B-Chat-v1.0 (1.1GB)
+            # NOTE: Flan-T5 models don't work well - they copy examples instead of generating
+            LLMProvider.HUGGINGFACE: "microsoft/phi-2",
             LLMProvider.LM_STUDIO: "google/gemma-3-27b"
         }
         return os.getenv("LLM_MODEL", defaults[self.provider])

survey_generator.py CHANGED Viewed

@@ -83,20 +83,16 @@ class SurveyGenerator:
     def _build_generation_prompt(self, outline, survey_type, num_questions, target_audience) -> str:
         """Build the user prompt for survey generation"""
-        # For T5 models, be very specific and direct
-        return f"""Create {num_questions} professional survey questions.
 Topic: {outline}
-Audience: {target_audience}
-Write {num_questions} questions numbered 1-{num_questions}. Each question must be specific to the topic above.
-Examples:
-1. What is your experience with X?
-2. How would you rate Y?
-3. What challenges do you face with Z?
-Your {num_questions} questions:
 1."""
     def _parse_survey_response(self, response: str) -> Dict:

     def _build_generation_prompt(self, outline, survey_type, num_questions, target_audience) -> str:
         """Build the user prompt for survey generation"""
+        # For causal LMs (Phi, Gemma, etc.) - more conversational
+        return f"""You are designing a {survey_type} survey for: {target_audience}
 Topic: {outline}
+Generate {num_questions} specific, relevant survey questions about this topic. Make each question clear and professional.
+Write your questions in a numbered list format (1., 2., 3., etc.). Focus on the specific topic and audience mentioned above.
+Questions:
 1."""
     def _parse_survey_response(self, response: str) -> Dict: