Spaces:

jmisak
/

ProjectEcho

Sleeping

App Files Files Community

jmisak commited on Oct 25, 2025

Commit

4858e1f

verified ·

1 Parent(s): c6b3c86

Upload 5 files

Browse files

Files changed (3) hide show

CHANGELOG.md +7 -4
README.md +170 -170
survey_generator.py +92 -75

CHANGELOG.md CHANGED Viewed

@@ -49,10 +49,13 @@ All notable changes to ConversAI will be documented in this file.
   - Better at JSON generation than smaller models
   - User can upgrade to xl/xxl or downgrade to base via LLM_MODEL env var
-- **Improved prompts** in `survey_generator.py`:
-  - Simplified prompts for better T5 model compatibility
-  - Added fallback survey generation if JSON parsing fails
-  - More direct instructions with concrete examples
 - **New dependencies added** to requirements.txt:
   - transformers>=4.36.0

   - Better at JSON generation than smaller models
   - User can upgrade to xl/xxl or downgrade to base via LLM_MODEL env var
+- **Complete rewrite of survey generation** in `survey_generator.py`:
+  - **Changed approach**: No longer asks model to generate JSON (T5 models struggle with structured output)
+  - **New workflow**: Model generates simple numbered list → we parse into JSON
+  - **Intelligent type detection**: Automatically detects question types (rating, yes/no, Likert, open-ended) based on keywords
+  - **Better reliability**: Plays to T5 strengths (text generation) instead of weaknesses (JSON)
+  - **Automatic title generation**: Creates survey title from user's outline
+  - Result: Much more reliable survey generation with T5 models
 - **New dependencies added** to requirements.txt:
   - transformers>=4.36.0

README.md CHANGED Viewed

@@ -1,170 +1,170 @@
----
-title: ProjectEcho - Qualitative Research Assistant
-emoji: 🔬
-colorFrom: blue
-colorTo: purple
-sdk: gradio
-sdk_version: 5.45.0
-app_file: app.py
-pinned: false
-license: mit
----
-# ConversAI - AI-Powered Qualitative Research Assistant
-Battle the blank page, reach global audiences, and uncover insights with AI assistance.
----
-> **✨ UPDATED (Nov 2025):** Now uses **local transformers** with **Google Flan-T5** models - Fast, reliable, and **completely FREE**! No API dependencies, runs directly on HuggingFace Spaces.
----
-## 🌟 Features
-### 📝 Survey Generation
-- Generate professional surveys from simple outlines
-- Follow industry best practices automatically
-- Choose from qualitative, quantitative, or mixed methods
-- Customize number of questions and target audience
-### 🌍 Survey Translation
-- Translate surveys to 18+ languages
-- Maintain cultural appropriateness and meaning
-- Reach global audiences effortlessly
-- Batch translation support
-### 📊 Data Analysis
-- AI-assisted thematic analysis
-- Sentiment analysis and emotional insights
-- Automatic pattern and trend detection
-- Generate actionable insights and recommendations
-- Export detailed analysis reports
-## 🚀 Quick Start
-**On HuggingFace Spaces:** Works immediately with zero configuration! Uses the free HF Inference API.
-**Workflow:**
-1. **Generate a Survey**: Start with an outline or topic description
-2. **Translate**: Select target languages to reach global audiences
-3. **Collect Responses**: Use the generated survey with your participants
-4. **Analyze**: Upload responses to uncover key findings and trends
-## 🔧 Configuration
-### Default: Local Transformers (Completely FREE!)
-**✨ Zero configuration needed!** ConversAI works out-of-the-box on HuggingFace Spaces using local model loading.
-**Default Model:** google/flan-t5-large
-- ✅ **100% Free** - No API keys, no costs, ever
-- ✅ **Good quality** - 1.2GB model, excellent at following instructions
-- ✅ **Fast after loading** - Typically 3-8 seconds per request after initial load
-- ✅ **No API dependencies** - Runs entirely on your Space's compute
-- ✅ **Private** - All processing happens locally, nothing sent to external APIs
-- ✅ **Reliable** - Google's instruction-tuned model, battle-tested
-**Setup for HuggingFace Spaces:**
-- Just deploy - models download automatically on first run
-- **No API keys or tokens required!**
-- Models are cached after first download for faster subsequent loads
-### Alternative Free Models
-You can try different free models by setting the `LLM_MODEL` environment variable:
-**Recommended Free Models (Local Transformers):**
-| Model | Best For | Speed | Quality | Model Size |
-|-------|----------|-------|---------|------------|
-| **google/flan-t5-base** | Testing - fastest | ⚡⚡⚡ Very Fast | ⭐⭐ Basic | 250MB |
-| **google/flan-t5-large** (default) | **Recommended** - balanced | ⚡⚡ Fast | ⭐⭐⭐ Good | 1.2GB |
-| **google/flan-t5-xl** | Better quality | ⚡ Medium | ⭐⭐⭐⭐ Excellent | 3GB |
-| **google/flan-t5-xxl** | Maximum quality | ⚡ Slower | ⭐⭐⭐⭐⭐ Best | 11GB |
-**Note:** Flan-T5 models are Google's instruction-tuned models, specifically designed for following instructions. They run locally with transformers library.
-**To change model:**
-```bash
-# In Space Settings → Variables
-LLM_MODEL=google/flan-t5-large  # Better quality
-# Or for maximum quality (requires more memory)
-LLM_MODEL=google/flan-t5-xl
-```
-**Why Local Transformers?**
-- ✅ **No API dependencies** - runs entirely on your Space
-- ✅ **No 404 errors** - no network issues
-- ✅ **Fast after loading** - models cached in memory
-- ✅ **Instruction-tuned** - designed for following prompts
-- ✅ **Privacy** - all processing happens locally
-### Tips for Best Performance with Local Models
-1. **Default model (flan-t5-large) is recommended** - Good balance of quality and speed
-2. **First load takes time** - Model downloads and loads (~2-3 minutes for large)
-3. **Subsequent requests are fast** - Model stays in memory (3-8 seconds)
-4. **For simple testing** - Use flan-t5-base (faster loading)
-5. **For best quality** - Use flan-t5-xl or xxl (requires more memory)
-6. **Keep prompts clear** - Simpler outlines work better with smaller models
-## 📦 Installation
-```bash
-# Install dependencies
-pip install -r requirements.txt
-# Check environment setup (optional but recommended)
-python check_env.py
-# Run the app
-python app.py
-```
-## 🏗️ Architecture
-ConversAI is built with a modular architecture:
-- **llm_backend.py** - Unified LLM interface supporting multiple providers
-- **survey_generator.py** - AI-powered survey generation
-- **survey_translator.py** - Multi-language translation engine
-- **data_analyzer.py** - Qualitative data analysis and insights
-- **app.py** - Gradio-based web interface
-- **export_utils.py** - Export to JSON, CSV, Markdown
-## 📄 Data Privacy
-- All processing is done through your configured LLM provider
-- No data is stored permanently by this application
-- Survey data and responses remain in your control
-- Suitable for sensitive research projects
-## 🤝 Contributing
-Contributions are welcome! This is a production-grade application designed for real-world qualitative research.
-## 📝 License
-MIT License - Feel free to use for research and commercial purposes.
----
-## 📚 Documentation
-**New to ConversAI?** Start with **[USER_GUIDE.md](USER_GUIDE.md)** for a complete walkthrough.
-**Quick Links:**
-- 📖 [Complete User Guide](USER_GUIDE.md) - How to use ConversAI (START HERE)
-- ⚡ [Quick Start for HF Spaces](QUICK_START_HF_SPACES.md) - 5-minute deployment
-- 🔧 [Troubleshooting](TROUBLESHOOTING.md) - Common issues and solutions
-- 🆓 [Free Models Guide](FREE_MODELS.md) - Best free models to use
-**Diagnostic Tools:**
-- Run `python check_env.py` - Check your environment setup
-- Run `python test_hf_backend.py` - Test HuggingFace connection
----
-Built with ❤️ using Gradio and state-of-the-art open-source LLMs

+---
+title: ConversAI - Qualitative Research Assistant
+emoji: 🔬
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 5.45.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# ConversAI - AI-Powered Qualitative Research Assistant
+Battle the blank page, reach global audiences, and uncover insights with AI assistance.
+---
+> **✨ UPDATED (Nov 2025):** Now uses **local transformers** with **Google Flan-T5** models - Fast, reliable, and **completely FREE**! No API dependencies, runs directly on HuggingFace Spaces.
+---
+## 🌟 Features
+### 📝 Survey Generation
+- Generate professional surveys from simple outlines
+- Follow industry best practices automatically
+- Choose from qualitative, quantitative, or mixed methods
+- Customize number of questions and target audience
+### 🌍 Survey Translation
+- Translate surveys to 18+ languages
+- Maintain cultural appropriateness and meaning
+- Reach global audiences effortlessly
+- Batch translation support
+### 📊 Data Analysis
+- AI-assisted thematic analysis
+- Sentiment analysis and emotional insights
+- Automatic pattern and trend detection
+- Generate actionable insights and recommendations
+- Export detailed analysis reports
+## 🚀 Quick Start
+**On HuggingFace Spaces:** Works immediately with zero configuration! Uses the free HF Inference API.
+**Workflow:**
+1. **Generate a Survey**: Start with an outline or topic description
+2. **Translate**: Select target languages to reach global audiences
+3. **Collect Responses**: Use the generated survey with your participants
+4. **Analyze**: Upload responses to uncover key findings and trends
+## 🔧 Configuration
+### Default: Local Transformers (Completely FREE!)
+**✨ Zero configuration needed!** ConversAI works out-of-the-box on HuggingFace Spaces using local model loading.
+**Default Model:** google/flan-t5-large
+- ✅ **100% Free** - No API keys, no costs, ever
+- ✅ **Good quality** - 1.2GB model, excellent at following instructions
+- ✅ **Fast after loading** - Typically 3-8 seconds per request after initial load
+- ✅ **No API dependencies** - Runs entirely on your Space's compute
+- ✅ **Private** - All processing happens locally, nothing sent to external APIs
+- ✅ **Reliable** - Google's instruction-tuned model, battle-tested
+**Setup for HuggingFace Spaces:**
+- Just deploy - models download automatically on first run
+- **No API keys or tokens required!**
+- Models are cached after first download for faster subsequent loads
+### Alternative Free Models
+You can try different free models by setting the `LLM_MODEL` environment variable:
+**Recommended Free Models (Local Transformers):**
+| Model | Best For | Speed | Quality | Model Size |
+|-------|----------|-------|---------|------------|
+| **google/flan-t5-base** | Testing - fastest | ⚡⚡⚡ Very Fast | ⭐⭐ Basic | 250MB |
+| **google/flan-t5-large** (default) | **Recommended** - balanced | ⚡⚡ Fast | ⭐⭐⭐ Good | 1.2GB |
+| **google/flan-t5-xl** | Better quality | ⚡ Medium | ⭐⭐⭐⭐ Excellent | 3GB |
+| **google/flan-t5-xxl** | Maximum quality | ⚡ Slower | ⭐⭐⭐⭐⭐ Best | 11GB |
+**Note:** Flan-T5 models are Google's instruction-tuned models, specifically designed for following instructions. They run locally with transformers library.
+**To change model:**
+```bash
+# In Space Settings → Variables
+LLM_MODEL=google/flan-t5-large  # Better quality
+# Or for maximum quality (requires more memory)
+LLM_MODEL=google/flan-t5-xl
+```
+**Why Local Transformers?**
+- ✅ **No API dependencies** - runs entirely on your Space
+- ✅ **No 404 errors** - no network issues
+- ✅ **Fast after loading** - models cached in memory
+- ✅ **Instruction-tuned** - designed for following prompts
+- ✅ **Privacy** - all processing happens locally
+### Tips for Best Performance with Local Models
+1. **Default model (flan-t5-large) is recommended** - Good balance of quality and speed
+2. **First load takes time** - Model downloads and loads (~2-3 minutes for large)
+3. **Subsequent requests are fast** - Model stays in memory (3-8 seconds)
+4. **For simple testing** - Use flan-t5-base (faster loading)
+5. **For best quality** - Use flan-t5-xl or xxl (requires more memory)
+6. **Keep prompts clear** - Simpler outlines work better with smaller models
+## 📦 Installation
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Check environment setup (optional but recommended)
+python check_env.py
+# Run the app
+python app.py
+```
+## 🏗️ Architecture
+ConversAI is built with a modular architecture:
+- **llm_backend.py** - Unified LLM interface supporting multiple providers
+- **survey_generator.py** - AI-powered survey generation
+- **survey_translator.py** - Multi-language translation engine
+- **data_analyzer.py** - Qualitative data analysis and insights
+- **app.py** - Gradio-based web interface
+- **export_utils.py** - Export to JSON, CSV, Markdown
+## 📄 Data Privacy
+- All processing is done through your configured LLM provider
+- No data is stored permanently by this application
+- Survey data and responses remain in your control
+- Suitable for sensitive research projects
+## 🤝 Contributing
+Contributions are welcome! This is a production-grade application designed for real-world qualitative research.
+## 📝 License
+MIT License - Feel free to use for research and commercial purposes.
+---
+## 📚 Documentation
+**New to ConversAI?** Start with **[USER_GUIDE.md](USER_GUIDE.md)** for a complete walkthrough.
+**Quick Links:**
+- 📖 [Complete User Guide](USER_GUIDE.md) - How to use ConversAI (START HERE)
+- ⚡ [Quick Start for HF Spaces](QUICK_START_HF_SPACES.md) - 5-minute deployment
+- 🔧 [Troubleshooting](TROUBLESHOOTING.md) - Common issues and solutions
+- 🆓 [Free Models Guide](FREE_MODELS.md) - Best free models to use
+**Diagnostic Tools:**
+- Run `python check_env.py` - Check your environment setup
+- Run `python test_hf_backend.py` - Test HuggingFace connection
+---
+Built with ❤️ using Gradio and state-of-the-art open-source LLMs

survey_generator.py CHANGED Viewed

@@ -43,6 +43,9 @@ class SurveyGenerator:
             response = self.llm.generate(messages, max_tokens=2000, temperature=0.7)
             survey_data = self._parse_survey_response(response)
             # Add metadata
             survey_data["metadata"] = {
                 "outline": outline,
@@ -56,103 +59,117 @@ class SurveyGenerator:
         except Exception as e:
             raise Exception(f"Survey generation failed: {str(e)}")
     def _get_system_prompt(self) -> str:
         """System prompt for survey generation"""
-        return """You are a professional survey designer. Create surveys in valid JSON format only."""
     def _build_generation_prompt(self, outline, survey_type, num_questions, target_audience) -> str:
         """Build the user prompt for survey generation"""
-        # For T5 models, we need a very simple, direct instruction
-        return f"""Task: Generate a JSON survey.
-Topic: {outline}
-Questions needed: {num_questions}
-Audience: {target_audience}
-Type: {survey_type}
-Required JSON format:
-{{"title": "Survey Title Here", "introduction": "Welcome message here", "questions": [{{"id": 1, "question_text": "Your first question?", "question_type": "open_ended", "required": true}}, {{"id": 2, "question_text": "Your second question?", "question_type": "open_ended", "required": true}}], "closing": "Thank you message here"}}
-Generate the complete survey JSON now:"""
     def _parse_survey_response(self, response: str) -> Dict:
         """Parse LLM response into survey structure"""
-        # Try to extract JSON from response
-        response = response.strip()
-        # Handle code blocks
-        if "```json" in response:
-            start = response.find("```json") + 7
-            end = response.find("```", start)
-            response = response[start:end].strip()
-        elif "```" in response:
-            start = response.find("```") + 3
-            end = response.find("```", start)
-            response = response[start:end].strip()
-        # Try to find JSON object in response
-        if "{" in response and "}" in response:
-            start = response.find("{")
-            end = response.rfind("}") + 1
-            response = response[start:end]
-        try:
-            survey_data = json.loads(response)
-            # Validate required fields
-            required_fields = ["title", "introduction", "questions", "closing"]
-            for field in required_fields:
-                if field not in survey_data:
-                    raise ValueError(f"Missing required field: {field}")
-            # Validate questions
-            if not isinstance(survey_data["questions"], list) or len(survey_data["questions"]) == 0:
-                raise ValueError("Survey must contain at least one question")
-            return survey_data
-        except (json.JSONDecodeError, ValueError) as e:
-            # Fallback: Try to create a simple survey from the response
-            print(f"Warning: JSON parsing failed, attempting fallback. Error: {e}")
-            return self._create_fallback_survey(response)
-    def _create_fallback_survey(self, response: str) -> Dict:
-        """Create a basic survey structure from non-JSON response"""
-        # Extract potential questions from numbered list
         lines = [line.strip() for line in response.split('\n') if line.strip()]
-        # Look for numbered items or lines with question marks
         questions = []
         question_id = 1
         for line in lines:
-            # Remove leading numbers, bullets, etc.
-            clean_line = line.lstrip('0123456789.-) ')
-            # Check if it looks like a question
-            if len(clean_line) > 10 and (clean_line.endswith('?') or
-                                          any(word in clean_line.lower() for word in ['what', 'how', 'why', 'when', 'where', 'which', 'would', 'could', 'should', 'do you'])):
-                questions.append({
-                    "id": question_id,
-                    "question_text": clean_line,
-                    "question_type": "open_ended",
-                    "required": True
-                })
-                question_id += 1
-        # If we didn't find enough questions, create generic ones
-        if len(questions) < 3:
             questions = [
-                {"id": 1, "question_text": "What are your thoughts on this topic?", "question_type": "open_ended", "required": True},
-                {"id": 2, "question_text": "Can you describe your experience?", "question_type": "open_ended", "required": True},
-                {"id": 3, "question_text": "What suggestions do you have for improvement?", "question_type": "open_ended", "required": True}
             ]
         return {
-            "title": "Survey",
-            "introduction": "Thank you for participating in this survey. Please answer the following questions.",
-            "questions": questions[:10],  # Limit to 10 questions
-            "closing": "Thank you for your time and feedback!"
         }
     def refine_question(self, question: str, improvement_type: str = "clarity") -> str:

             response = self.llm.generate(messages, max_tokens=2000, temperature=0.7)
             survey_data = self._parse_survey_response(response)
+            # Generate better title based on outline
+            survey_data["title"] = self._generate_title(outline, survey_type)
             # Add metadata
             survey_data["metadata"] = {
                 "outline": outline,
         except Exception as e:
             raise Exception(f"Survey generation failed: {str(e)}")
+    def _generate_title(self, outline: str, survey_type: str) -> str:
+        """Generate a survey title from the outline"""
+        # Extract key topic from outline (first sentence or first 50 chars)
+        first_sentence = outline.split('.')[0].strip()
+        if len(first_sentence) > 60:
+            first_sentence = first_sentence[:60] + "..."
+        # Capitalize first letter
+        topic = first_sentence[0].upper() + first_sentence[1:] if first_sentence else "Research"
+        # Create title based on survey type
+        if survey_type.lower() == "qualitative":
+            return f"{topic} - Qualitative Survey"
+        elif survey_type.lower() == "quantitative":
+            return f"{topic} - Quantitative Survey"
+        else:
+            return f"{topic} Survey"
     def _get_system_prompt(self) -> str:
         """System prompt for survey generation"""
+        return """You are a professional survey designer. Generate clear, professional survey questions."""
     def _build_generation_prompt(self, outline, survey_type, num_questions, target_audience) -> str:
         """Build the user prompt for survey generation"""
+        # For T5 models, ask for simple numbered list instead of JSON
+        return f"""Generate {num_questions} survey questions about: {outline}
+Target audience: {target_audience}
+Survey type: {survey_type}
+Create {num_questions} clear, professional questions. Write each question on a new line starting with a number.
+Example format:
+1. What is your overall experience with [topic]?
+2. How would you rate [specific aspect]?
+3. What improvements would you suggest?
+Now generate {num_questions} questions:"""
     def _parse_survey_response(self, response: str) -> Dict:
         """Parse LLM response into survey structure"""
+        # Parse numbered list format (not JSON)
+        return self._parse_numbered_list(response)
+    def _parse_numbered_list(self, response: str) -> Dict:
+        """Parse numbered list of questions into survey structure"""
         lines = [line.strip() for line in response.split('\n') if line.strip()]
         questions = []
         question_id = 1
         for line in lines:
+            # Skip empty lines or lines that are too short
+            if len(line) < 5:
+                continue
+            # Remove leading numbers, bullets, dashes, etc.
+            clean_line = line.lstrip('0123456789.-) \t')
+            # Skip lines that don't look like questions
+            if len(clean_line) < 10:
+                continue
+            # Determine question type based on content
+            question_type = "open_ended"
+            options = None
+            lower_line = clean_line.lower()
+            # Check for rating/scale questions
+            if any(word in lower_line for word in ['rate', 'scale', 'rating', 'score']):
+                question_type = "rating"
+                options = ["1 - Poor", "2 - Fair", "3 - Good", "4 - Very Good", "5 - Excellent"]
+            # Check for yes/no questions
+            elif clean_line.endswith('?') and any(word in lower_line for word in ['do you', 'have you', 'would you', 'can you', 'should', 'is it', 'are you']):
+                if 'how much' not in lower_line and 'how many' not in lower_line:
+                    question_type = "yes_no"
+                    options = ["Yes", "No"]
+            # Check for satisfaction questions
+            elif any(word in lower_line for word in ['satisfy', 'satisfaction', 'satisfied']):
+                question_type = "likert_scale"
+                options = ["Very Satisfied", "Satisfied", "Neutral", "Dissatisfied", "Very Dissatisfied"]
+            question = {
+                "id": question_id,
+                "question_text": clean_line,
+                "question_type": question_type,
+                "required": True
+            }
+            if options:
+                question["options"] = options
+            questions.append(question)
+            question_id += 1
+        # If we didn't find any questions, create generic ones
+        if len(questions) == 0:
             questions = [
+                {"id": 1, "question_text": "What are your overall thoughts on this topic?", "question_type": "open_ended", "required": True},
+                {"id": 2, "question_text": "Can you describe your experience in detail?", "question_type": "open_ended", "required": True},
+                {"id": 3, "question_text": "What specific suggestions do you have for improvement?", "question_type": "open_ended", "required": True}
             ]
         return {
+            "title": "Research Survey",
+            "introduction": "Thank you for taking the time to participate in this survey. Your responses will help us better understand your experiences and perspectives. Please answer all questions honestly and thoroughly.",
+            "questions": questions[:20],  # Limit to 20 questions
+            "closing": "Thank you for your valuable time and feedback! Your responses are greatly appreciated and will be used to improve our understanding of this topic."
         }
     def refine_question(self, question: str, improvement_type: str = "clarity") -> str: