jmisak commited on
Commit
56589d3
·
verified ·
1 Parent(s): 57fa449

Upload 13 files

Browse files
.gitignore ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HuggingFace Spaces Deployment - DO NOT UPLOAD THESE
2
+
3
+ # Environment and secrets
4
+ .env
5
+ *.env
6
+
7
+ # Logs
8
+ *.log
9
+ logs/
10
+ session_*.log
11
+ summary_*.txt
12
+ summary_*.json
13
+
14
+ # Outputs
15
+ outputs/
16
+ *.csv
17
+ *.pdf
18
+ spaces_deployment/
19
+
20
+ # Python
21
+ __pycache__/
22
+ *.pyc
23
+ *.pyo
24
+ *.pyd
25
+ .Python
26
+ *.so
27
+ *.egg
28
+ *.egg-info/
29
+ dist/
30
+ build/
31
+
32
+ # IDE
33
+ .vscode/
34
+ .idea/
35
+ *.swp
36
+ *.swo
37
+ *~
38
+
39
+ # Test files
40
+ test_*.py
41
+ debug_*.py
42
+ check_*.py
43
+ verify_*.py
44
+ fix_*.py
45
+ patch_*.py
46
+ create_sample_*.py
47
+ update_*.py
48
+
49
+ # Documentation (optional - you can upload if you want)
50
+ *.md
51
+ !README.md
52
+
53
+ # OS
54
+ .DS_Store
55
+ Thumbs.db
DEPLOY_TO_SPACES.md ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deploy to HuggingFace Spaces - Quick Start
2
+
3
+ ## ✅ Issue Fixed
4
+ **The `quote_extractor` import error has been fixed!** The app will now work even if the file is missing.
5
+
6
+ ---
7
+
8
+ ## 🚀 Option 1: Automated Preparation (Recommended)
9
+
10
+ Run this script to prepare a clean deployment package:
11
+
12
+ ```bash
13
+ python prepare_for_spaces.py
14
+ ```
15
+
16
+ This will:
17
+ - Create a `spaces_deployment/` directory
18
+ - Copy only the required files
19
+ - Remove any .env or test files
20
+ - Show you a summary of what's included
21
+
22
+ Then upload everything from `spaces_deployment/` to your Space.
23
+
24
+ ---
25
+
26
+ ## 📋 Option 2: Manual Upload
27
+
28
+ Upload these files to your HuggingFace Space:
29
+
30
+ ### Required Files (Must have)
31
+ ```
32
+ app.py
33
+ llm.py
34
+ extractors.py
35
+ tagging.py
36
+ chunking.py
37
+ validation.py
38
+ reporting.py
39
+ dashboard.py
40
+ production_logger.py
41
+ quote_extractor.py
42
+ requirements.txt
43
+ ```
44
+
45
+ ### Optional Files
46
+ ```
47
+ README.md
48
+ HUGGINGFACE_SPACES_SETUP.md
49
+ ```
50
+
51
+ **DO NOT upload:**
52
+ - `.env` file
53
+ - `test_*.py` files
54
+ - `logs/` directory
55
+ - `outputs/` directory
56
+
57
+ ---
58
+
59
+ ## 🔧 Space Configuration
60
+
61
+ ### 1. Create Space
62
+ - Go to https://huggingface.co/new-space
63
+ - Name: `transcriptor-ai` (or your choice)
64
+ - SDK: **Gradio**
65
+ - Hardware: **GPU (T4 or better)** ← Important!
66
+
67
+ ### 2. Upload Files
68
+ - Drag and drop all files from the list above
69
+ - OR connect a Git repository
70
+
71
+ ### 3. Configure (Optional)
72
+ Go to **Settings → Variables** and add:
73
+
74
+ | Variable | Value | When to Use |
75
+ |----------|-------|-------------|
76
+ | `DEBUG_MODE` | `True` | To see detailed logs |
77
+ | `LOCAL_MODEL` | `TinyLlama/TinyLlama-1.1B-Chat-v1.0` | For faster (but lower quality) processing |
78
+ | `LLM_TEMPERATURE` | `0.5` | For more deterministic outputs |
79
+
80
+ **Note:** All settings have defaults - you don't need to configure anything!
81
+
82
+ ---
83
+
84
+ ## ⏱️ First Deployment
85
+
86
+ ### What to Expect
87
+ 1. **Build time:** 2-5 minutes (installing dependencies)
88
+ 2. **Model download:** 2-5 minutes (first time only - downloads Phi-3-mini)
89
+ 3. **Subsequent starts:** 30-60 seconds
90
+
91
+ ### Watch the Logs
92
+ Click **Logs** tab to see:
93
+ ```
94
+ ✅ Configuration loaded for HuggingFace Spaces
95
+ 🚀 TranscriptorAI Enterprise - LLM Backend: local
96
+ [Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
97
+ Downloading (…)lve/main/config.json: 100%
98
+ [Local Model] ✅ Model loaded on cuda:0
99
+ Running on local URL: http://0.0.0.0:7860
100
+ ```
101
+
102
+ ---
103
+
104
+ ## 🧪 Test Your Space
105
+
106
+ 1. Wait for "Running on local URL" message
107
+ 2. Upload a sample transcript (DOCX or PDF)
108
+ 3. Select "HCP" as interviewee type
109
+ 4. Click "Analyze Transcripts"
110
+
111
+ **Expected:**
112
+ - Processing time: 5-10 minutes (depending on transcript length)
113
+ - Quality score: 0.7-1.0
114
+ - CSV and PDF downloads available
115
+
116
+ ---
117
+
118
+ ## 🐛 Troubleshooting
119
+
120
+ ### Error: `ModuleNotFoundError: No module named 'quote_extractor'`
121
+ **Status:** ✅ FIXED - This is now optional
122
+
123
+ ### Error: `ModuleNotFoundError: No module named 'xyz'`
124
+ **Solution:** Upload the missing `xyz.py` file
125
+
126
+ ### Error: `CUDA out of memory`
127
+ **Solution:**
128
+ - Change model: Add Variable `LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0`
129
+ - OR upgrade to larger GPU
130
+
131
+ ### Error: Very slow processing
132
+ **Check:**
133
+ - Is GPU hardware selected? (Not CPU)
134
+ - Look for "Model loaded on cuda:0" in logs
135
+ - If you see "cpu", upgrade to GPU tier
136
+
137
+ ### Quality Score still 0.00
138
+ **Debug:**
139
+ 1. Set `DEBUG_MODE=True` in Variables
140
+ 2. Check logs for "[Local Model] ✅ Generated X characters"
141
+ 3. Look for "[LLM Debug] Successfully extracted JSON"
142
+ 4. If you see `[Error]` messages, share them
143
+
144
+ ---
145
+
146
+ ## 💡 Tips
147
+
148
+ ### Reduce Costs
149
+ - Space sleeps after 48h inactivity (free)
150
+ - Only pays for GPU time when active
151
+ - ~$0.60/hour for T4 GPU
152
+
153
+ ### Improve Speed
154
+ - Use smaller model (TinyLlama)
155
+ - Reduce max tokens (edit llm.py line 410)
156
+ - Process fewer chunks
157
+
158
+ ### Improve Quality
159
+ - Use larger model (Mistral-7B)
160
+ - Increase temperature for creative outputs
161
+ - Keep default Phi-3-mini for best balance
162
+
163
+ ---
164
+
165
+ ## 📞 Need Help?
166
+
167
+ 1. **Check logs first** - Most issues show clear error messages
168
+ 2. **Read HUGGINGFACE_SPACES_SETUP.md** - Detailed troubleshooting
169
+ 3. **Test locally first** - Run `python test_local_model.py`
170
+
171
+ ---
172
+
173
+ ## ✨ You're Ready!
174
+
175
+ Run the preparation script:
176
+ ```bash
177
+ python prepare_for_spaces.py
178
+ ```
179
+
180
+ Then upload to HuggingFace Spaces and you're done! 🎉
181
+
182
+ ---
183
+
184
+ **Last Updated:** October 2025
FILES_TO_UPLOAD.txt ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ===============================================================================
2
+ FILES TO UPLOAD TO HUGGINGFACE SPACES
3
+ ===============================================================================
4
+
5
+ ✅ COPY THESE FILES TO YOUR SPACE (11 files total):
6
+
7
+ 1. app.py - Main application (REQUIRED - HF Spaces entry point)
8
+ 2. llm.py - LLM inference with local models
9
+ 3. extractors.py - Document text extraction (DOCX/PDF)
10
+ 4. tagging.py - Speaker tagging
11
+ 5. chunking.py - Text chunking
12
+ 6. validation.py - Quality validation
13
+ 7. reporting.py - CSV/PDF report generation
14
+ 8. dashboard.py - Dashboard generation
15
+ 9. production_logger.py - Session logging
16
+ 10. quote_extractor.py - Quote extraction (optional but recommended)
17
+ 11. requirements.txt - Python dependencies
18
+
19
+ ===============================================================================
20
+ OPTIONAL - NICE TO HAVE:
21
+ ===============================================================================
22
+
23
+ - README.md - Documentation for your Space
24
+
25
+ ===============================================================================
26
+ DO NOT UPLOAD:
27
+ ===============================================================================
28
+
29
+ ❌ .env - Contains secrets (use Spaces Variables instead)
30
+ ❌ test_*.py - Test files
31
+ ❌ *.log - Log files
32
+ ❌ logs/ - Log directory
33
+ ❌ outputs/ - Output directory
34
+ ❌ __pycache__/ - Python cache
35
+
36
+ ===============================================================================
37
+ HUGGINGFACE SPACES SETTINGS:
38
+ ===============================================================================
39
+
40
+ Space SDK: Gradio
41
+ Hardware: GPU (T4 or better) ⚠️ IMPORTANT - CPU will be very slow!
42
+
43
+ Optional Variables (Settings → Variables):
44
+ - DEBUG_MODE = True (to see detailed logs)
45
+ - LOCAL_MODEL = microsoft/Phi-3-mini-4k-instruct (default, no need to set)
46
+
47
+ ===============================================================================
48
+ DEPLOYMENT METHOD:
49
+ ===============================================================================
50
+
51
+ Option 1: Direct Upload
52
+ - Go to your Space → Files → Upload files
53
+ - Drag and drop the 11 files above
54
+
55
+ Option 2: Git Repository
56
+ - Create a Git repo with these files
57
+ - Add .gitignore (already created)
58
+ - Connect repo to your Space
59
+ - Auto-deploys on push
60
+
61
+ ===============================================================================
62
+ FIRST TIME STARTUP:
63
+ ===============================================================================
64
+
65
+ 1. Dependencies install: ~2-5 minutes
66
+ 2. Model download: ~2-5 minutes (Phi-3-mini downloads automatically)
67
+ 3. Total first startup: ~5-10 minutes
68
+
69
+ Subsequent starts: ~30-60 seconds (model is cached)
70
+
71
+ ===============================================================================
72
+ VERIFICATION:
73
+ ===============================================================================
74
+
75
+ Check the Logs tab - you should see:
76
+
77
+ ✅ Configuration loaded for HuggingFace Spaces
78
+ 🚀 TranscriptorAI Enterprise - LLM Backend: local
79
+ [Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
80
+ [Local Model] ✅ Model loaded on cuda:0
81
+ Running on local URL: http://0.0.0.0:7860
82
+
83
+ ===============================================================================
REQUIRED_FILES_FOR_SPACES.md ADDED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Required Files for HuggingFace Spaces Deployment
2
+
3
+ ## ✅ CRITICAL - Must Upload These Files
4
+
5
+ ### Main Application
6
+ - `app.py` - Main Gradio application
7
+
8
+ ### Core Processing Modules
9
+ - `llm.py` - LLM inference (local model support)
10
+ - `extractors.py` - DOCX/PDF text extraction
11
+ - `tagging.py` - Speaker identification
12
+ - `chunking.py` - Semantic text chunking
13
+ - `validation.py` - Quality scoring and validation
14
+ - `reporting.py` - CSV/PDF report generation
15
+ - `dashboard.py` - Dashboard generation
16
+ - `production_logger.py` - Session logging
17
+
18
+ ### Optional but Recommended
19
+ - `quote_extractor.py` - Market research quote extraction (now optional)
20
+
21
+ ### Configuration
22
+ - `requirements.txt` - Python dependencies
23
+ - `README.md` - Documentation (optional but good practice)
24
+
25
+ ---
26
+
27
+ ## ❌ DO NOT Upload These Files
28
+
29
+ ### Local Development Only
30
+ - `.env` - Contains local secrets (use Spaces Variables instead)
31
+ - `*.log` - Log files
32
+ - `logs/` - Log directory
33
+ - `outputs/` - Output directory
34
+ - `__pycache__/` - Python cache
35
+ - `.git/` - Git repository
36
+
37
+ ### Test Files (Not Needed)
38
+ - `test_*.py` - All test scripts
39
+ - `check_*.py` - Check scripts
40
+ - `debug_*.py` - Debug scripts
41
+ - `verify_*.py` - Verification scripts
42
+ - `fix_*.py` - Fix scripts
43
+ - `patch_*.py` - Patch scripts
44
+ - `create_sample_*.py` - Sample creation
45
+
46
+ ### Documentation (Optional)
47
+ - `*.md` files - Helpful but not required for app to run
48
+ - You can upload them if you want documentation in your Space
49
+
50
+ ---
51
+
52
+ ## 📦 Minimal File List (Absolute Minimum)
53
+
54
+ If you want the smallest deployment, upload only these:
55
+
56
+ ```
57
+ app.py
58
+ llm.py
59
+ extractors.py
60
+ tagging.py
61
+ chunking.py
62
+ validation.py
63
+ reporting.py
64
+ dashboard.py
65
+ production_logger.py
66
+ requirements.txt
67
+ ```
68
+
69
+ **Quote extraction will be disabled** but everything else will work.
70
+
71
+ ---
72
+
73
+ ## 📋 Complete File List (Recommended)
74
+
75
+ Upload all core files plus quote extraction:
76
+
77
+ ```
78
+ app.py
79
+ llm.py
80
+ extractors.py
81
+ tagging.py
82
+ chunking.py
83
+ validation.py
84
+ reporting.py
85
+ dashboard.py
86
+ production_logger.py
87
+ quote_extractor.py
88
+ requirements.txt
89
+ README.md (optional)
90
+ ```
91
+
92
+ ---
93
+
94
+ ## 🔍 How to Check What's Missing
95
+
96
+ If you get `ModuleNotFoundError: No module named 'xyz'`, you need to upload `xyz.py`.
97
+
98
+ **Common missing modules:**
99
+ - `quote_extractor` → Upload `quote_extractor.py`
100
+ - `production_logger` → Upload `production_logger.py`
101
+ - `dashboard` → Upload `dashboard.py`
102
+
103
+ ---
104
+
105
+ ## 📁 Folder Structure on HuggingFace Spaces
106
+
107
+ Your Space should look like:
108
+
109
+ ```
110
+ your-space/
111
+ ├── app.py
112
+ ├── llm.py
113
+ ├── extractors.py
114
+ ├── tagging.py
115
+ ├── chunking.py
116
+ ├── validation.py
117
+ ├── reporting.py
118
+ ├── dashboard.py
119
+ ├── production_logger.py
120
+ ├── quote_extractor.py (optional)
121
+ ├── requirements.txt
122
+ └── README.md (optional)
123
+ ```
124
+
125
+ **Do NOT create subdirectories** - keep all Python files in the root.
126
+
127
+ ---
128
+
129
+ ## 🚀 Quick Upload Checklist
130
+
131
+ Before uploading to Spaces:
132
+
133
+ - [ ] `app.py` - Main file
134
+ - [ ] All imported modules (llm, extractors, etc.)
135
+ - [ ] `requirements.txt` - Dependencies
136
+ - [ ] Selected **GPU** hardware in Spaces settings
137
+ - [ ] No `.env` file included
138
+ - [ ] No test/debug files included
139
+
140
+ ---
141
+
142
+ ## 🔧 Troubleshooting Import Errors
143
+
144
+ ### Error: `ModuleNotFoundError: No module named 'quote_extractor'`
145
+ **Fixed!** This is now optional - app will work without it.
146
+
147
+ ### Error: `ModuleNotFoundError: No module named 'extractors'`
148
+ **Solution:** Upload `extractors.py`
149
+
150
+ ### Error: `ModuleNotFoundError: No module named 'production_logger'`
151
+ **Solution:** Upload `production_logger.py`
152
+
153
+ ### Error: `ModuleNotFoundError: No module named 'transformers'`
154
+ **Solution:** Check `requirements.txt` is uploaded and correct
155
+
156
+ ---
157
+
158
+ ## 📝 Alternative: Use Git Repository
159
+
160
+ Instead of manual upload, you can:
161
+
162
+ 1. Create a Git repository with only required files
163
+ 2. Connect it to your HuggingFace Space
164
+ 3. Auto-deploy on push
165
+
166
+ **Create `.gitignore` to exclude:**
167
+ ```
168
+ .env
169
+ *.log
170
+ logs/
171
+ outputs/
172
+ __pycache__/
173
+ test_*.py
174
+ debug_*.py
175
+ *.pyc
176
+ ```
177
+
178
+ ---
179
+
180
+ ## Last Updated
181
+ October 2025
SIMPLE_UPLOAD_LIST.txt ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ================================================================================
2
+ HUGGINGFACE SPACES - FILES TO UPLOAD
3
+ ================================================================================
4
+
5
+ Just upload these 11 files to your Space:
6
+
7
+ 1. app.py ← MAIN FILE (required by HF Spaces)
8
+ 2. llm.py
9
+ 3. extractors.py
10
+ 4. tagging.py
11
+ 5. chunking.py
12
+ 6. validation.py
13
+ 7. reporting.py
14
+ 8. dashboard.py
15
+ 9. production_logger.py
16
+ 10. quote_extractor.py
17
+ 11. requirements.txt
18
+
19
+ ================================================================================
20
+ SPACE SETTINGS
21
+ ================================================================================
22
+
23
+ SDK: Gradio
24
+ Hardware: GPU (T4) ← IMPORTANT! Don't use CPU
25
+
26
+ ================================================================================
27
+ THAT'S IT!
28
+ ================================================================================
29
+
30
+ No terminal commands needed.
31
+ No .env file needed.
32
+ No configuration needed.
33
+
34
+ Just upload the 11 files and it works!
35
+
36
+ ================================================================================
UPLOAD_TO_SPACES_CHECKLIST.md ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HuggingFace Spaces Upload Checklist
2
+
3
+ ## ✅ Pre-Upload Checklist
4
+
5
+ Your app is ready! Just upload these files:
6
+
7
+ ### Required Files (Check off as you upload)
8
+
9
+ - [ ] `app.py` ← **MAIN FILE - HuggingFace Spaces needs this exact name**
10
+ - [ ] `llm.py`
11
+ - [ ] `extractors.py`
12
+ - [ ] `tagging.py`
13
+ - [ ] `chunking.py`
14
+ - [ ] `validation.py`
15
+ - [ ] `reporting.py`
16
+ - [ ] `dashboard.py`
17
+ - [ ] `production_logger.py`
18
+ - [ ] `quote_extractor.py`
19
+ - [ ] `requirements.txt`
20
+
21
+ **Total: 11 files**
22
+
23
+ ---
24
+
25
+ ## 🚫 DO NOT Upload
26
+
27
+ - ❌ `.env` file
28
+ - ❌ `test_*.py` files
29
+ - ❌ `*.log` files
30
+ - ❌ `logs/` folder
31
+ - ❌ `outputs/` folder
32
+ - ❌ `__pycache__/` folder
33
+
34
+ ---
35
+
36
+ ## 🎯 Upload Steps
37
+
38
+ ### 1. Create Your Space
39
+ 1. Go to: https://huggingface.co/new-space
40
+ 2. Enter a name (e.g., `transcriptor-ai`)
41
+ 3. Choose **Gradio** as SDK
42
+ 4. Select **GPU** hardware (T4 minimum) ⚠️ **IMPORTANT!**
43
+ 5. Click "Create Space"
44
+
45
+ ### 2. Upload Files
46
+
47
+ **Method A: Drag & Drop**
48
+ 1. Click "Files" tab in your Space
49
+ 2. Click "Upload files"
50
+ 3. Drag all 11 files from the checklist above
51
+ 4. Click "Commit"
52
+
53
+ **Method B: Git Repository**
54
+ 1. Create a new Git repo
55
+ 2. Copy the 11 files above
56
+ 3. Add `.gitignore` (already created for you)
57
+ 4. Push to repo
58
+ 5. Connect repo to Space in Settings
59
+
60
+ ### 3. Configure Space (Optional)
61
+
62
+ Go to **Settings → Variables** and add (all optional):
63
+
64
+ | Variable | Value | Why |
65
+ |----------|-------|-----|
66
+ | `DEBUG_MODE` | `True` | See detailed logs |
67
+ | `LLM_TEMPERATURE` | `0.7` | Already the default |
68
+
69
+ **You don't need to configure anything** - it works out of the box!
70
+
71
+ ---
72
+
73
+ ## ⏱️ What to Expect
74
+
75
+ ### First Startup
76
+ 1. **Installing dependencies:** 2-5 minutes
77
+ 2. **Downloading Phi-3-mini model:** 2-5 minutes
78
+ 3. **Total:** ~5-10 minutes
79
+
80
+ Watch the **Logs** tab - you'll see:
81
+ ```
82
+ Installing dependencies...
83
+ ✅ Configuration loaded for HuggingFace Spaces
84
+ 🚀 TranscriptorAI Enterprise - LLM Backend: local
85
+ [Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
86
+ Downloading model files...
87
+ [Local Model] ✅ Model loaded on cuda:0
88
+ Running on local URL: http://0.0.0.0:7860
89
+ ```
90
+
91
+ ### Subsequent Startups
92
+ - **Only 30-60 seconds** (model is cached)
93
+
94
+ ---
95
+
96
+ ## ✅ Verify It's Working
97
+
98
+ ### 1. Check Startup Logs
99
+
100
+ Look for these lines in the Logs tab:
101
+
102
+ ✅ `Configuration loaded for HuggingFace Spaces`
103
+ ✅ `LLM Backend: local`
104
+ ✅ `Model loaded on cuda:0` ← GPU confirmed!
105
+ ✅ `Running on local URL`
106
+
107
+ ### 2. Test with Sample
108
+
109
+ 1. Click "Upload Files"
110
+ 2. Upload a DOCX transcript
111
+ 3. Select "HCP" as interviewee type
112
+ 4. Click "Analyze Transcripts"
113
+ 5. Wait 5-10 minutes for processing
114
+
115
+ **Expected Result:**
116
+ - Quality Score: 0.7-1.0 (not 0.00!)
117
+ - CSV and PDF downloads available
118
+ - Dashboard shows charts
119
+
120
+ ---
121
+
122
+ ## 🐛 Common Issues
123
+
124
+ ### Issue: `ModuleNotFoundError: No module named 'xyz'`
125
+ **Solution:** Upload the missing `xyz.py` file
126
+
127
+ ### Issue: Very slow or hangs
128
+ **Check:** Did you select GPU hardware?
129
+ 1. Go to Settings
130
+ 2. Under Hardware, choose "GPU (T4)"
131
+ 3. Restart Space
132
+
133
+ ### Issue: Quality Score 0.00
134
+ **Solution:**
135
+ 1. Add Variable: `DEBUG_MODE=True`
136
+ 2. Check logs for error messages
137
+ 3. Look for "[Local Model] ✅ Generated" to confirm it's working
138
+
139
+ ### Issue: Out of memory
140
+ **Solution:**
141
+ 1. Add Variable: `LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0`
142
+ 2. OR upgrade to larger GPU
143
+
144
+ ---
145
+
146
+ ## 💰 Cost
147
+
148
+ ### Free Tier (CPU)
149
+ - ⚠️ Very slow (10+ minutes per transcript)
150
+ - Not recommended
151
+
152
+ ### GPU (T4) - ~$0.60/hour
153
+ - ✅ Recommended
154
+ - Fast processing (~5-10 min per transcript)
155
+ - Space sleeps after inactivity (saves money)
156
+ - Only charged when active
157
+
158
+ ---
159
+
160
+ ## 📋 Quick Reference
161
+
162
+ **Space must have:**
163
+ - `app.py` as main file ✅ (already correct)
164
+ - `requirements.txt` with dependencies ✅ (already correct)
165
+ - GPU hardware selected ⚠️ (you must select this)
166
+
167
+ **No .env file needed** - everything configured in code ✅
168
+
169
+ **No terminal commands needed** - all automatic ✅
170
+
171
+ ---
172
+
173
+ ## 🎉 Ready to Deploy!
174
+
175
+ 1. ✅ Check you have all 11 files
176
+ 2. ✅ Create Space with GPU hardware
177
+ 3. ✅ Upload files via drag & drop
178
+ 4. ✅ Wait for build (watch Logs tab)
179
+ 5. ✅ Test with a transcript
180
+
181
+ **See `FILES_TO_UPLOAD.txt` for the complete list of files.**
182
+
183
+ ---
184
+
185
+ ## 📞 Still Stuck?
186
+
187
+ Common causes:
188
+ 1. **Forgot to upload a file** - Check all 11 files are uploaded
189
+ 2. **Selected CPU instead of GPU** - Change in Settings
190
+ 3. **Uploaded .env file** - Delete it, not needed on Spaces
191
+
192
+ ---
193
+
194
+ **Last Updated:** October 2025
195
+
196
+ **You're ready - just upload the 11 files and you're done!** 🚀
app.py CHANGED
@@ -9,9 +9,19 @@ from llm import query_llm, extract_structured_data
9
  from reporting import generate_enhanced_csv, generate_enhanced_pdf
10
  from dashboard import generate_comprehensive_dashboard
11
  from validation import validate_transcript_quality, check_data_completeness
12
- from quote_extractor import extract_quotes_from_results
13
  from production_logger import init_session, ProductionLogger, PerformanceMonitor
14
 
 
 
 
 
 
 
 
 
 
 
 
15
  # Optional imports for enhanced validation (may not exist in older deployments)
16
  try:
17
  from validation import verify_consensus_claims, validate_summary_quality
 
9
  from reporting import generate_enhanced_csv, generate_enhanced_pdf
10
  from dashboard import generate_comprehensive_dashboard
11
  from validation import validate_transcript_quality, check_data_completeness
 
12
  from production_logger import init_session, ProductionLogger, PerformanceMonitor
13
 
14
+ # Optional: Quote extraction for market research storytelling
15
+ try:
16
+ from quote_extractor import extract_quotes_from_results
17
+ HAS_QUOTE_EXTRACTION = True
18
+ except ImportError:
19
+ HAS_QUOTE_EXTRACTION = False
20
+ print("⚠️ Quote extraction not available - reports will not include storytelling quotes")
21
+ def extract_quotes_from_results(results, interviewee_type):
22
+ """Stub function when quote_extractor is not available"""
23
+ return {"quotes": [], "themes": {}, "top_quotes": []}
24
+
25
  # Optional imports for enhanced validation (may not exist in older deployments)
26
  try:
27
  from validation import verify_consensus_claims, validate_summary_quality
prepare_for_spaces.py ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Prepare files for HuggingFace Spaces deployment
4
+ Copies only the required files to a clean directory
5
+ """
6
+
7
+ import os
8
+ import shutil
9
+ from pathlib import Path
10
+
11
+ # Required files for HuggingFace Spaces
12
+ REQUIRED_FILES = [
13
+ # Core application
14
+ 'app.py',
15
+
16
+ # Processing modules
17
+ 'llm.py',
18
+ 'extractors.py',
19
+ 'tagging.py',
20
+ 'chunking.py',
21
+ 'validation.py',
22
+ 'reporting.py',
23
+ 'dashboard.py',
24
+ 'production_logger.py',
25
+
26
+ # Optional but recommended
27
+ 'quote_extractor.py',
28
+
29
+ # Configuration
30
+ 'requirements.txt',
31
+
32
+ # Documentation (optional)
33
+ 'README.md',
34
+ 'HUGGINGFACE_SPACES_SETUP.md',
35
+ ]
36
+
37
+ def prepare_deployment(output_dir='./spaces_deployment'):
38
+ """Copy required files to deployment directory"""
39
+
40
+ # Create output directory
41
+ output_path = Path(output_dir)
42
+ if output_path.exists():
43
+ print(f"⚠️ Directory {output_dir} already exists")
44
+ response = input("Delete and recreate? (y/n): ")
45
+ if response.lower() != 'y':
46
+ print("❌ Cancelled")
47
+ return
48
+ shutil.rmtree(output_path)
49
+
50
+ output_path.mkdir(exist_ok=True)
51
+ print(f"📁 Created directory: {output_dir}\n")
52
+
53
+ # Copy files
54
+ copied = []
55
+ missing = []
56
+
57
+ for filename in REQUIRED_FILES:
58
+ src = Path(filename)
59
+ if src.exists():
60
+ dst = output_path / filename
61
+ shutil.copy2(src, dst)
62
+ size_kb = src.stat().st_size / 1024
63
+ print(f" ✅ {filename} ({size_kb:.1f} KB)")
64
+ copied.append(filename)
65
+ else:
66
+ print(f" ⚠️ {filename} - NOT FOUND (skipping)")
67
+ missing.append(filename)
68
+
69
+ # Summary
70
+ print("\n" + "="*80)
71
+ print("📊 SUMMARY")
72
+ print("="*80)
73
+ print(f"✅ Copied: {len(copied)} files")
74
+ if missing:
75
+ print(f"⚠️ Missing: {len(missing)} files")
76
+ print(f" {', '.join(missing)}")
77
+
78
+ print(f"\n📦 Deployment files ready in: {output_dir}/")
79
+ print("\n📋 Next steps:")
80
+ print("1. Go to https://huggingface.co/new-space")
81
+ print("2. Select Gradio SDK and GPU hardware")
82
+ print("3. Upload all files from the deployment directory")
83
+ print("4. Wait for model download (~2-5 min first time)")
84
+ print("5. Test your Space!")
85
+
86
+ # Check for .env file (should not be included)
87
+ if (output_path / '.env').exists():
88
+ print("\n⚠️ WARNING: .env file found in deployment directory!")
89
+ print(" This should NOT be deployed to HuggingFace Spaces")
90
+ os.remove(output_path / '.env')
91
+ print(" ✅ Removed .env file")
92
+
93
+ print("\n✨ Deployment package ready!")
94
+
95
+ if __name__ == '__main__':
96
+ prepare_deployment()