Spaces:
Sleeping
Sleeping
Upload 4 files
Browse files- CHANGELOG.md +18 -14
- README.md +17 -16
- llm_backend.py +5 -4
- survey_generator.py +5 -9
CHANGELOG.md
CHANGED
|
@@ -10,9 +10,9 @@ All notable changes to ConversAI will be documented in this file.
|
|
| 10 |
- **No API endpoint issues** - everything runs on your Space
|
| 11 |
- **Faster after first load** - models cached in memory
|
| 12 |
- **100% private** - all processing happens locally
|
| 13 |
-
- Default model: **
|
| 14 |
-
- Supports
|
| 15 |
-
- **Important**:
|
| 16 |
|
| 17 |
### Added
|
| 18 |
- **New dependencies**: transformers, torch, accelerate, sentencepiece
|
|
@@ -24,15 +24,18 @@ All notable changes to ConversAI will be documented in this file.
|
|
| 24 |
- First request: ~1-2 minutes (download + load)
|
| 25 |
- Subsequent requests: ~2-5 seconds
|
| 26 |
|
| 27 |
-
- **Support for multiple
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
|
| 33 |
### Fixed
|
| 34 |
- **No more 404 API errors** - eliminated all API endpoint issues
|
| 35 |
- **No API token required** - works without any credentials on HF Spaces
|
|
|
|
|
|
|
|
|
|
| 36 |
- Faster generation after initial model load
|
| 37 |
- More reliable - no network dependencies
|
| 38 |
- Better privacy - all processing local
|
|
@@ -44,12 +47,13 @@ All notable changes to ConversAI will be documented in this file.
|
|
| 44 |
- Added model caching to keep models in memory
|
| 45 |
- Auto-detects CUDA/CPU and optimizes accordingly
|
| 46 |
|
| 47 |
-
- **Default model**: `
|
| 48 |
-
- Changed from API-based to local
|
| 49 |
-
-
|
| 50 |
-
-
|
| 51 |
-
-
|
| 52 |
-
-
|
|
|
|
| 53 |
|
| 54 |
- **Complete rewrite of survey generation** in `survey_generator.py`:
|
| 55 |
- **Changed approach**: No longer asks model to generate JSON (T5 models struggle with structured output)
|
|
|
|
| 10 |
- **No API endpoint issues** - everything runs on your Space
|
| 11 |
- **Faster after first load** - models cached in memory
|
| 12 |
- **100% private** - all processing happens locally
|
| 13 |
+
- Default model: **microsoft/phi-2** (2.7GB, causal language model)
|
| 14 |
+
- Supports causal LMs: Phi-2, Gemma-2B-IT, TinyLlama, Mistral-7B
|
| 15 |
+
- **Important**: Use causal LMs only; Flan-T5 models copy examples instead of generating
|
| 16 |
|
| 17 |
### Added
|
| 18 |
- **New dependencies**: transformers, torch, accelerate, sentencepiece
|
|
|
|
| 24 |
- First request: ~1-2 minutes (download + load)
|
| 25 |
- Subsequent requests: ~2-5 seconds
|
| 26 |
|
| 27 |
+
- **Support for multiple causal LM sizes**: Choose based on your needs
|
| 28 |
+
- TinyLlama-1.1B-Chat: 1.1GB (fast, fair quality)
|
| 29 |
+
- Gemma-2B-IT: 2GB (good quality)
|
| 30 |
+
- Phi-2: 2.7GB (excellent quality, default)
|
| 31 |
+
- Mistral-7B-Instruct: 7GB (best quality)
|
| 32 |
|
| 33 |
### Fixed
|
| 34 |
- **No more 404 API errors** - eliminated all API endpoint issues
|
| 35 |
- **No API token required** - works without any credentials on HF Spaces
|
| 36 |
+
- **Fixed generic/example-copying questions** - switched from Flan-T5 to Phi-2
|
| 37 |
+
- Flan-T5 models would generate "How would you rate X/Y/Z?" by copying examples
|
| 38 |
+
- Phi-2 (causal LM) generates actual contextual questions about the topic
|
| 39 |
- Faster generation after initial model load
|
| 40 |
- More reliable - no network dependencies
|
| 41 |
- Better privacy - all processing local
|
|
|
|
| 47 |
- Added model caching to keep models in memory
|
| 48 |
- Auto-detects CUDA/CPU and optimizes accordingly
|
| 49 |
|
| 50 |
+
- **Default model**: `microsoft/phi-2` (line 85)
|
| 51 |
+
- Changed from API-based to local causal language model
|
| 52 |
+
- **Critical fix**: Switched from Flan-T5 to Phi-2 (causal LM)
|
| 53 |
+
- **Problem identified**: Flan-T5 (encoder-decoder) copies examples instead of generating contextual questions
|
| 54 |
+
- **Solution**: Phi-2 (decoder-only) is designed for creative text generation
|
| 55 |
+
- 2.7GB model provides excellent contextual question generation
|
| 56 |
+
- User can try Gemma-2B-IT, TinyLlama, or Mistral-7B via LLM_MODEL env var
|
| 57 |
|
| 58 |
- **Complete rewrite of survey generation** in `survey_generator.py`:
|
| 59 |
- **Changed approach**: No longer asks model to generate JSON (T5 models struggle with structured output)
|
README.md
CHANGED
|
@@ -16,7 +16,7 @@ Battle the blank page, reach global audiences, and uncover insights with AI assi
|
|
| 16 |
|
| 17 |
---
|
| 18 |
|
| 19 |
-
> **β¨ UPDATED (Nov 2025):** Now uses **local transformers** with **
|
| 20 |
|
| 21 |
---
|
| 22 |
|
|
@@ -57,13 +57,13 @@ Battle the blank page, reach global audiences, and uncover insights with AI assi
|
|
| 57 |
|
| 58 |
**β¨ Zero configuration needed!** ConversAI works out-of-the-box on HuggingFace Spaces using local model loading.
|
| 59 |
|
| 60 |
-
**Default Model:**
|
| 61 |
- β
**100% Free** - No API keys, no costs, ever
|
| 62 |
-
- β
**
|
| 63 |
- β
**Good speed** - Typically 5-10 seconds per request after initial load
|
| 64 |
- β
**No API dependencies** - Runs entirely on your Space's compute
|
| 65 |
- β
**Private** - All processing happens locally, nothing sent to external APIs
|
| 66 |
-
- β
**
|
| 67 |
|
| 68 |
**Setup for HuggingFace Spaces:**
|
| 69 |
- Just deploy - models download automatically on first run
|
|
@@ -78,20 +78,20 @@ You can try different free models by setting the `LLM_MODEL` environment variabl
|
|
| 78 |
|
| 79 |
| Model | Best For | Speed | Quality | Model Size |
|
| 80 |
|-------|----------|-------|---------|------------|
|
| 81 |
-
| **
|
| 82 |
-
| **google/
|
| 83 |
-
| **
|
| 84 |
-
| **
|
| 85 |
|
| 86 |
-
**Note:**
|
| 87 |
|
| 88 |
**To change model:**
|
| 89 |
```bash
|
| 90 |
# In Space Settings β Variables
|
| 91 |
-
LLM_MODEL=google/
|
| 92 |
|
| 93 |
# Or for maximum quality (requires more memory)
|
| 94 |
-
LLM_MODEL=
|
| 95 |
```
|
| 96 |
|
| 97 |
**Why Local Transformers?**
|
|
@@ -103,12 +103,13 @@ LLM_MODEL=google/flan-t5-xl
|
|
| 103 |
|
| 104 |
### Tips for Best Performance with Local Models
|
| 105 |
|
| 106 |
-
1. **Use
|
| 107 |
-
2. **First load takes time** - Model downloads and loads (~3
|
| 108 |
3. **Subsequent requests are fast** - Model stays in memory (5-10 seconds)
|
| 109 |
-
4. **For maximum quality** - Use
|
| 110 |
-
5. **
|
| 111 |
-
6. **
|
|
|
|
| 112 |
|
| 113 |
## π¦ Installation
|
| 114 |
|
|
|
|
| 16 |
|
| 17 |
---
|
| 18 |
|
| 19 |
+
> **β¨ UPDATED (Nov 2025):** Now uses **local transformers** with **Microsoft Phi-2** - Fast, contextual, and **completely FREE**! No API dependencies, runs directly on HuggingFace Spaces. Generates actual topic-specific questions (not generic templates).
|
| 20 |
|
| 21 |
---
|
| 22 |
|
|
|
|
| 57 |
|
| 58 |
**β¨ Zero configuration needed!** ConversAI works out-of-the-box on HuggingFace Spaces using local model loading.
|
| 59 |
|
| 60 |
+
**Default Model:** microsoft/phi-2
|
| 61 |
- β
**100% Free** - No API keys, no costs, ever
|
| 62 |
+
- β
**Excellent quality** - 2.7GB causal language model, great at creative text generation
|
| 63 |
- β
**Good speed** - Typically 5-10 seconds per request after initial load
|
| 64 |
- β
**No API dependencies** - Runs entirely on your Space's compute
|
| 65 |
- β
**Private** - All processing happens locally, nothing sent to external APIs
|
| 66 |
+
- β
**Contextual** - Generates relevant, topic-specific questions (not generic)
|
| 67 |
|
| 68 |
**Setup for HuggingFace Spaces:**
|
| 69 |
- Just deploy - models download automatically on first run
|
|
|
|
| 78 |
|
| 79 |
| Model | Best For | Speed | Quality | Model Size |
|
| 80 |
|-------|----------|-------|---------|------------|
|
| 81 |
+
| **TinyLlama/TinyLlama-1.1B-Chat-v1.0** | Quick testing | β‘β‘β‘ Very Fast | ββ Fair | 1.1GB |
|
| 82 |
+
| **google/gemma-2b-it** | Faster alternative | β‘β‘ Fast | βββ Good | 2GB |
|
| 83 |
+
| **microsoft/phi-2** (default) | **Recommended** - best balance | β‘ Good | ββββ Excellent | 2.7GB |
|
| 84 |
+
| **mistralai/Mistral-7B-Instruct-v0.2** | Maximum quality | β‘ Slower | βββββ Best | 7GB |
|
| 85 |
|
| 86 |
+
**Note:** These are causal language models (decoder-only) designed for text generation. **Do NOT use Flan-T5 models** - they copy examples instead of generating contextual questions.
|
| 87 |
|
| 88 |
**To change model:**
|
| 89 |
```bash
|
| 90 |
# In Space Settings β Variables
|
| 91 |
+
LLM_MODEL=google/gemma-2b-it # Faster alternative
|
| 92 |
|
| 93 |
# Or for maximum quality (requires more memory)
|
| 94 |
+
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
|
| 95 |
```
|
| 96 |
|
| 97 |
**Why Local Transformers?**
|
|
|
|
| 103 |
|
| 104 |
### Tips for Best Performance with Local Models
|
| 105 |
|
| 106 |
+
1. **Use Phi-2 (default)** - Best balance of quality and resource usage
|
| 107 |
+
2. **First load takes time** - Model downloads and loads (~2-3 minutes for Phi-2)
|
| 108 |
3. **Subsequent requests are fast** - Model stays in memory (5-10 seconds)
|
| 109 |
+
4. **For maximum quality** - Use Mistral-7B-Instruct (requires 8GB+ RAM)
|
| 110 |
+
5. **For faster loading** - Use Gemma-2B-IT or TinyLlama (good quality, smaller)
|
| 111 |
+
6. **Avoid Flan-T5 models** - They copy examples instead of generating contextual questions
|
| 112 |
+
7. **Be specific in outlines** - More detail helps model generate better questions
|
| 113 |
|
| 114 |
## π¦ Installation
|
| 115 |
|
llm_backend.py
CHANGED
|
@@ -78,10 +78,11 @@ class LLMBackend:
|
|
| 78 |
defaults = {
|
| 79 |
LLMProvider.OPENAI: "gpt-4o-mini",
|
| 80 |
LLMProvider.ANTHROPIC: "claude-3-5-sonnet-20241022",
|
| 81 |
-
# Using
|
| 82 |
-
#
|
| 83 |
-
#
|
| 84 |
-
|
|
|
|
| 85 |
LLMProvider.LM_STUDIO: "google/gemma-3-27b"
|
| 86 |
}
|
| 87 |
return os.getenv("LLM_MODEL", defaults[self.provider])
|
|
|
|
| 78 |
defaults = {
|
| 79 |
LLMProvider.OPENAI: "gpt-4o-mini",
|
| 80 |
LLMProvider.ANTHROPIC: "claude-3-5-sonnet-20241022",
|
| 81 |
+
# Using Microsoft Phi-2 - causal LM, better at creative text generation
|
| 82 |
+
# Phi-2 is 2.7GB and instruction-tuned for following prompts
|
| 83 |
+
# Alternative: google/gemma-2b-it (2GB) or TinyLlama/TinyLlama-1.1B-Chat-v1.0 (1.1GB)
|
| 84 |
+
# NOTE: Flan-T5 models don't work well - they copy examples instead of generating
|
| 85 |
+
LLMProvider.HUGGINGFACE: "microsoft/phi-2",
|
| 86 |
LLMProvider.LM_STUDIO: "google/gemma-3-27b"
|
| 87 |
}
|
| 88 |
return os.getenv("LLM_MODEL", defaults[self.provider])
|
survey_generator.py
CHANGED
|
@@ -83,20 +83,16 @@ class SurveyGenerator:
|
|
| 83 |
|
| 84 |
def _build_generation_prompt(self, outline, survey_type, num_questions, target_audience) -> str:
|
| 85 |
"""Build the user prompt for survey generation"""
|
| 86 |
-
# For
|
| 87 |
-
return f"""
|
| 88 |
|
| 89 |
Topic: {outline}
|
| 90 |
-
Audience: {target_audience}
|
| 91 |
|
| 92 |
-
|
| 93 |
|
| 94 |
-
|
| 95 |
-
1. What is your experience with X?
|
| 96 |
-
2. How would you rate Y?
|
| 97 |
-
3. What challenges do you face with Z?
|
| 98 |
|
| 99 |
-
|
| 100 |
1."""
|
| 101 |
|
| 102 |
def _parse_survey_response(self, response: str) -> Dict:
|
|
|
|
| 83 |
|
| 84 |
def _build_generation_prompt(self, outline, survey_type, num_questions, target_audience) -> str:
|
| 85 |
"""Build the user prompt for survey generation"""
|
| 86 |
+
# For causal LMs (Phi, Gemma, etc.) - more conversational
|
| 87 |
+
return f"""You are designing a {survey_type} survey for: {target_audience}
|
| 88 |
|
| 89 |
Topic: {outline}
|
|
|
|
| 90 |
|
| 91 |
+
Generate {num_questions} specific, relevant survey questions about this topic. Make each question clear and professional.
|
| 92 |
|
| 93 |
+
Write your questions in a numbered list format (1., 2., 3., etc.). Focus on the specific topic and audience mentioned above.
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
+
Questions:
|
| 96 |
1."""
|
| 97 |
|
| 98 |
def _parse_survey_response(self, response: str) -> Dict:
|