Spaces:
Sleeping
Sleeping
Upload 13 files
Browse files- .gitignore +55 -0
- DEPLOY_TO_SPACES.md +184 -0
- FILES_TO_UPLOAD.txt +83 -0
- REQUIRED_FILES_FOR_SPACES.md +181 -0
- SIMPLE_UPLOAD_LIST.txt +36 -0
- UPLOAD_TO_SPACES_CHECKLIST.md +196 -0
- app.py +11 -1
- prepare_for_spaces.py +96 -0
.gitignore
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# HuggingFace Spaces Deployment - DO NOT UPLOAD THESE
|
| 2 |
+
|
| 3 |
+
# Environment and secrets
|
| 4 |
+
.env
|
| 5 |
+
*.env
|
| 6 |
+
|
| 7 |
+
# Logs
|
| 8 |
+
*.log
|
| 9 |
+
logs/
|
| 10 |
+
session_*.log
|
| 11 |
+
summary_*.txt
|
| 12 |
+
summary_*.json
|
| 13 |
+
|
| 14 |
+
# Outputs
|
| 15 |
+
outputs/
|
| 16 |
+
*.csv
|
| 17 |
+
*.pdf
|
| 18 |
+
spaces_deployment/
|
| 19 |
+
|
| 20 |
+
# Python
|
| 21 |
+
__pycache__/
|
| 22 |
+
*.pyc
|
| 23 |
+
*.pyo
|
| 24 |
+
*.pyd
|
| 25 |
+
.Python
|
| 26 |
+
*.so
|
| 27 |
+
*.egg
|
| 28 |
+
*.egg-info/
|
| 29 |
+
dist/
|
| 30 |
+
build/
|
| 31 |
+
|
| 32 |
+
# IDE
|
| 33 |
+
.vscode/
|
| 34 |
+
.idea/
|
| 35 |
+
*.swp
|
| 36 |
+
*.swo
|
| 37 |
+
*~
|
| 38 |
+
|
| 39 |
+
# Test files
|
| 40 |
+
test_*.py
|
| 41 |
+
debug_*.py
|
| 42 |
+
check_*.py
|
| 43 |
+
verify_*.py
|
| 44 |
+
fix_*.py
|
| 45 |
+
patch_*.py
|
| 46 |
+
create_sample_*.py
|
| 47 |
+
update_*.py
|
| 48 |
+
|
| 49 |
+
# Documentation (optional - you can upload if you want)
|
| 50 |
+
*.md
|
| 51 |
+
!README.md
|
| 52 |
+
|
| 53 |
+
# OS
|
| 54 |
+
.DS_Store
|
| 55 |
+
Thumbs.db
|
DEPLOY_TO_SPACES.md
ADDED
|
@@ -0,0 +1,184 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Deploy to HuggingFace Spaces - Quick Start
|
| 2 |
+
|
| 3 |
+
## ✅ Issue Fixed
|
| 4 |
+
**The `quote_extractor` import error has been fixed!** The app will now work even if the file is missing.
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## 🚀 Option 1: Automated Preparation (Recommended)
|
| 9 |
+
|
| 10 |
+
Run this script to prepare a clean deployment package:
|
| 11 |
+
|
| 12 |
+
```bash
|
| 13 |
+
python prepare_for_spaces.py
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
This will:
|
| 17 |
+
- Create a `spaces_deployment/` directory
|
| 18 |
+
- Copy only the required files
|
| 19 |
+
- Remove any .env or test files
|
| 20 |
+
- Show you a summary of what's included
|
| 21 |
+
|
| 22 |
+
Then upload everything from `spaces_deployment/` to your Space.
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
## 📋 Option 2: Manual Upload
|
| 27 |
+
|
| 28 |
+
Upload these files to your HuggingFace Space:
|
| 29 |
+
|
| 30 |
+
### Required Files (Must have)
|
| 31 |
+
```
|
| 32 |
+
app.py
|
| 33 |
+
llm.py
|
| 34 |
+
extractors.py
|
| 35 |
+
tagging.py
|
| 36 |
+
chunking.py
|
| 37 |
+
validation.py
|
| 38 |
+
reporting.py
|
| 39 |
+
dashboard.py
|
| 40 |
+
production_logger.py
|
| 41 |
+
quote_extractor.py
|
| 42 |
+
requirements.txt
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
### Optional Files
|
| 46 |
+
```
|
| 47 |
+
README.md
|
| 48 |
+
HUGGINGFACE_SPACES_SETUP.md
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
**DO NOT upload:**
|
| 52 |
+
- `.env` file
|
| 53 |
+
- `test_*.py` files
|
| 54 |
+
- `logs/` directory
|
| 55 |
+
- `outputs/` directory
|
| 56 |
+
|
| 57 |
+
---
|
| 58 |
+
|
| 59 |
+
## 🔧 Space Configuration
|
| 60 |
+
|
| 61 |
+
### 1. Create Space
|
| 62 |
+
- Go to https://huggingface.co/new-space
|
| 63 |
+
- Name: `transcriptor-ai` (or your choice)
|
| 64 |
+
- SDK: **Gradio**
|
| 65 |
+
- Hardware: **GPU (T4 or better)** ← Important!
|
| 66 |
+
|
| 67 |
+
### 2. Upload Files
|
| 68 |
+
- Drag and drop all files from the list above
|
| 69 |
+
- OR connect a Git repository
|
| 70 |
+
|
| 71 |
+
### 3. Configure (Optional)
|
| 72 |
+
Go to **Settings → Variables** and add:
|
| 73 |
+
|
| 74 |
+
| Variable | Value | When to Use |
|
| 75 |
+
|----------|-------|-------------|
|
| 76 |
+
| `DEBUG_MODE` | `True` | To see detailed logs |
|
| 77 |
+
| `LOCAL_MODEL` | `TinyLlama/TinyLlama-1.1B-Chat-v1.0` | For faster (but lower quality) processing |
|
| 78 |
+
| `LLM_TEMPERATURE` | `0.5` | For more deterministic outputs |
|
| 79 |
+
|
| 80 |
+
**Note:** All settings have defaults - you don't need to configure anything!
|
| 81 |
+
|
| 82 |
+
---
|
| 83 |
+
|
| 84 |
+
## ⏱️ First Deployment
|
| 85 |
+
|
| 86 |
+
### What to Expect
|
| 87 |
+
1. **Build time:** 2-5 minutes (installing dependencies)
|
| 88 |
+
2. **Model download:** 2-5 minutes (first time only - downloads Phi-3-mini)
|
| 89 |
+
3. **Subsequent starts:** 30-60 seconds
|
| 90 |
+
|
| 91 |
+
### Watch the Logs
|
| 92 |
+
Click **Logs** tab to see:
|
| 93 |
+
```
|
| 94 |
+
✅ Configuration loaded for HuggingFace Spaces
|
| 95 |
+
🚀 TranscriptorAI Enterprise - LLM Backend: local
|
| 96 |
+
[Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
|
| 97 |
+
Downloading (…)lve/main/config.json: 100%
|
| 98 |
+
[Local Model] ✅ Model loaded on cuda:0
|
| 99 |
+
Running on local URL: http://0.0.0.0:7860
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
---
|
| 103 |
+
|
| 104 |
+
## 🧪 Test Your Space
|
| 105 |
+
|
| 106 |
+
1. Wait for "Running on local URL" message
|
| 107 |
+
2. Upload a sample transcript (DOCX or PDF)
|
| 108 |
+
3. Select "HCP" as interviewee type
|
| 109 |
+
4. Click "Analyze Transcripts"
|
| 110 |
+
|
| 111 |
+
**Expected:**
|
| 112 |
+
- Processing time: 5-10 minutes (depending on transcript length)
|
| 113 |
+
- Quality score: 0.7-1.0
|
| 114 |
+
- CSV and PDF downloads available
|
| 115 |
+
|
| 116 |
+
---
|
| 117 |
+
|
| 118 |
+
## 🐛 Troubleshooting
|
| 119 |
+
|
| 120 |
+
### Error: `ModuleNotFoundError: No module named 'quote_extractor'`
|
| 121 |
+
**Status:** ✅ FIXED - This is now optional
|
| 122 |
+
|
| 123 |
+
### Error: `ModuleNotFoundError: No module named 'xyz'`
|
| 124 |
+
**Solution:** Upload the missing `xyz.py` file
|
| 125 |
+
|
| 126 |
+
### Error: `CUDA out of memory`
|
| 127 |
+
**Solution:**
|
| 128 |
+
- Change model: Add Variable `LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0`
|
| 129 |
+
- OR upgrade to larger GPU
|
| 130 |
+
|
| 131 |
+
### Error: Very slow processing
|
| 132 |
+
**Check:**
|
| 133 |
+
- Is GPU hardware selected? (Not CPU)
|
| 134 |
+
- Look for "Model loaded on cuda:0" in logs
|
| 135 |
+
- If you see "cpu", upgrade to GPU tier
|
| 136 |
+
|
| 137 |
+
### Quality Score still 0.00
|
| 138 |
+
**Debug:**
|
| 139 |
+
1. Set `DEBUG_MODE=True` in Variables
|
| 140 |
+
2. Check logs for "[Local Model] ✅ Generated X characters"
|
| 141 |
+
3. Look for "[LLM Debug] Successfully extracted JSON"
|
| 142 |
+
4. If you see `[Error]` messages, share them
|
| 143 |
+
|
| 144 |
+
---
|
| 145 |
+
|
| 146 |
+
## 💡 Tips
|
| 147 |
+
|
| 148 |
+
### Reduce Costs
|
| 149 |
+
- Space sleeps after 48h inactivity (free)
|
| 150 |
+
- Only pays for GPU time when active
|
| 151 |
+
- ~$0.60/hour for T4 GPU
|
| 152 |
+
|
| 153 |
+
### Improve Speed
|
| 154 |
+
- Use smaller model (TinyLlama)
|
| 155 |
+
- Reduce max tokens (edit llm.py line 410)
|
| 156 |
+
- Process fewer chunks
|
| 157 |
+
|
| 158 |
+
### Improve Quality
|
| 159 |
+
- Use larger model (Mistral-7B)
|
| 160 |
+
- Increase temperature for creative outputs
|
| 161 |
+
- Keep default Phi-3-mini for best balance
|
| 162 |
+
|
| 163 |
+
---
|
| 164 |
+
|
| 165 |
+
## 📞 Need Help?
|
| 166 |
+
|
| 167 |
+
1. **Check logs first** - Most issues show clear error messages
|
| 168 |
+
2. **Read HUGGINGFACE_SPACES_SETUP.md** - Detailed troubleshooting
|
| 169 |
+
3. **Test locally first** - Run `python test_local_model.py`
|
| 170 |
+
|
| 171 |
+
---
|
| 172 |
+
|
| 173 |
+
## ✨ You're Ready!
|
| 174 |
+
|
| 175 |
+
Run the preparation script:
|
| 176 |
+
```bash
|
| 177 |
+
python prepare_for_spaces.py
|
| 178 |
+
```
|
| 179 |
+
|
| 180 |
+
Then upload to HuggingFace Spaces and you're done! 🎉
|
| 181 |
+
|
| 182 |
+
---
|
| 183 |
+
|
| 184 |
+
**Last Updated:** October 2025
|
FILES_TO_UPLOAD.txt
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
===============================================================================
|
| 2 |
+
FILES TO UPLOAD TO HUGGINGFACE SPACES
|
| 3 |
+
===============================================================================
|
| 4 |
+
|
| 5 |
+
✅ COPY THESE FILES TO YOUR SPACE (11 files total):
|
| 6 |
+
|
| 7 |
+
1. app.py - Main application (REQUIRED - HF Spaces entry point)
|
| 8 |
+
2. llm.py - LLM inference with local models
|
| 9 |
+
3. extractors.py - Document text extraction (DOCX/PDF)
|
| 10 |
+
4. tagging.py - Speaker tagging
|
| 11 |
+
5. chunking.py - Text chunking
|
| 12 |
+
6. validation.py - Quality validation
|
| 13 |
+
7. reporting.py - CSV/PDF report generation
|
| 14 |
+
8. dashboard.py - Dashboard generation
|
| 15 |
+
9. production_logger.py - Session logging
|
| 16 |
+
10. quote_extractor.py - Quote extraction (optional but recommended)
|
| 17 |
+
11. requirements.txt - Python dependencies
|
| 18 |
+
|
| 19 |
+
===============================================================================
|
| 20 |
+
OPTIONAL - NICE TO HAVE:
|
| 21 |
+
===============================================================================
|
| 22 |
+
|
| 23 |
+
- README.md - Documentation for your Space
|
| 24 |
+
|
| 25 |
+
===============================================================================
|
| 26 |
+
DO NOT UPLOAD:
|
| 27 |
+
===============================================================================
|
| 28 |
+
|
| 29 |
+
❌ .env - Contains secrets (use Spaces Variables instead)
|
| 30 |
+
❌ test_*.py - Test files
|
| 31 |
+
❌ *.log - Log files
|
| 32 |
+
❌ logs/ - Log directory
|
| 33 |
+
❌ outputs/ - Output directory
|
| 34 |
+
❌ __pycache__/ - Python cache
|
| 35 |
+
|
| 36 |
+
===============================================================================
|
| 37 |
+
HUGGINGFACE SPACES SETTINGS:
|
| 38 |
+
===============================================================================
|
| 39 |
+
|
| 40 |
+
Space SDK: Gradio
|
| 41 |
+
Hardware: GPU (T4 or better) ⚠️ IMPORTANT - CPU will be very slow!
|
| 42 |
+
|
| 43 |
+
Optional Variables (Settings → Variables):
|
| 44 |
+
- DEBUG_MODE = True (to see detailed logs)
|
| 45 |
+
- LOCAL_MODEL = microsoft/Phi-3-mini-4k-instruct (default, no need to set)
|
| 46 |
+
|
| 47 |
+
===============================================================================
|
| 48 |
+
DEPLOYMENT METHOD:
|
| 49 |
+
===============================================================================
|
| 50 |
+
|
| 51 |
+
Option 1: Direct Upload
|
| 52 |
+
- Go to your Space → Files → Upload files
|
| 53 |
+
- Drag and drop the 11 files above
|
| 54 |
+
|
| 55 |
+
Option 2: Git Repository
|
| 56 |
+
- Create a Git repo with these files
|
| 57 |
+
- Add .gitignore (already created)
|
| 58 |
+
- Connect repo to your Space
|
| 59 |
+
- Auto-deploys on push
|
| 60 |
+
|
| 61 |
+
===============================================================================
|
| 62 |
+
FIRST TIME STARTUP:
|
| 63 |
+
===============================================================================
|
| 64 |
+
|
| 65 |
+
1. Dependencies install: ~2-5 minutes
|
| 66 |
+
2. Model download: ~2-5 minutes (Phi-3-mini downloads automatically)
|
| 67 |
+
3. Total first startup: ~5-10 minutes
|
| 68 |
+
|
| 69 |
+
Subsequent starts: ~30-60 seconds (model is cached)
|
| 70 |
+
|
| 71 |
+
===============================================================================
|
| 72 |
+
VERIFICATION:
|
| 73 |
+
===============================================================================
|
| 74 |
+
|
| 75 |
+
Check the Logs tab - you should see:
|
| 76 |
+
|
| 77 |
+
✅ Configuration loaded for HuggingFace Spaces
|
| 78 |
+
🚀 TranscriptorAI Enterprise - LLM Backend: local
|
| 79 |
+
[Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
|
| 80 |
+
[Local Model] ✅ Model loaded on cuda:0
|
| 81 |
+
Running on local URL: http://0.0.0.0:7860
|
| 82 |
+
|
| 83 |
+
===============================================================================
|
REQUIRED_FILES_FOR_SPACES.md
ADDED
|
@@ -0,0 +1,181 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Required Files for HuggingFace Spaces Deployment
|
| 2 |
+
|
| 3 |
+
## ✅ CRITICAL - Must Upload These Files
|
| 4 |
+
|
| 5 |
+
### Main Application
|
| 6 |
+
- `app.py` - Main Gradio application
|
| 7 |
+
|
| 8 |
+
### Core Processing Modules
|
| 9 |
+
- `llm.py` - LLM inference (local model support)
|
| 10 |
+
- `extractors.py` - DOCX/PDF text extraction
|
| 11 |
+
- `tagging.py` - Speaker identification
|
| 12 |
+
- `chunking.py` - Semantic text chunking
|
| 13 |
+
- `validation.py` - Quality scoring and validation
|
| 14 |
+
- `reporting.py` - CSV/PDF report generation
|
| 15 |
+
- `dashboard.py` - Dashboard generation
|
| 16 |
+
- `production_logger.py` - Session logging
|
| 17 |
+
|
| 18 |
+
### Optional but Recommended
|
| 19 |
+
- `quote_extractor.py` - Market research quote extraction (now optional)
|
| 20 |
+
|
| 21 |
+
### Configuration
|
| 22 |
+
- `requirements.txt` - Python dependencies
|
| 23 |
+
- `README.md` - Documentation (optional but good practice)
|
| 24 |
+
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
## ❌ DO NOT Upload These Files
|
| 28 |
+
|
| 29 |
+
### Local Development Only
|
| 30 |
+
- `.env` - Contains local secrets (use Spaces Variables instead)
|
| 31 |
+
- `*.log` - Log files
|
| 32 |
+
- `logs/` - Log directory
|
| 33 |
+
- `outputs/` - Output directory
|
| 34 |
+
- `__pycache__/` - Python cache
|
| 35 |
+
- `.git/` - Git repository
|
| 36 |
+
|
| 37 |
+
### Test Files (Not Needed)
|
| 38 |
+
- `test_*.py` - All test scripts
|
| 39 |
+
- `check_*.py` - Check scripts
|
| 40 |
+
- `debug_*.py` - Debug scripts
|
| 41 |
+
- `verify_*.py` - Verification scripts
|
| 42 |
+
- `fix_*.py` - Fix scripts
|
| 43 |
+
- `patch_*.py` - Patch scripts
|
| 44 |
+
- `create_sample_*.py` - Sample creation
|
| 45 |
+
|
| 46 |
+
### Documentation (Optional)
|
| 47 |
+
- `*.md` files - Helpful but not required for app to run
|
| 48 |
+
- You can upload them if you want documentation in your Space
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
## 📦 Minimal File List (Absolute Minimum)
|
| 53 |
+
|
| 54 |
+
If you want the smallest deployment, upload only these:
|
| 55 |
+
|
| 56 |
+
```
|
| 57 |
+
app.py
|
| 58 |
+
llm.py
|
| 59 |
+
extractors.py
|
| 60 |
+
tagging.py
|
| 61 |
+
chunking.py
|
| 62 |
+
validation.py
|
| 63 |
+
reporting.py
|
| 64 |
+
dashboard.py
|
| 65 |
+
production_logger.py
|
| 66 |
+
requirements.txt
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
**Quote extraction will be disabled** but everything else will work.
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
## 📋 Complete File List (Recommended)
|
| 74 |
+
|
| 75 |
+
Upload all core files plus quote extraction:
|
| 76 |
+
|
| 77 |
+
```
|
| 78 |
+
app.py
|
| 79 |
+
llm.py
|
| 80 |
+
extractors.py
|
| 81 |
+
tagging.py
|
| 82 |
+
chunking.py
|
| 83 |
+
validation.py
|
| 84 |
+
reporting.py
|
| 85 |
+
dashboard.py
|
| 86 |
+
production_logger.py
|
| 87 |
+
quote_extractor.py
|
| 88 |
+
requirements.txt
|
| 89 |
+
README.md (optional)
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
## 🔍 How to Check What's Missing
|
| 95 |
+
|
| 96 |
+
If you get `ModuleNotFoundError: No module named 'xyz'`, you need to upload `xyz.py`.
|
| 97 |
+
|
| 98 |
+
**Common missing modules:**
|
| 99 |
+
- `quote_extractor` → Upload `quote_extractor.py`
|
| 100 |
+
- `production_logger` → Upload `production_logger.py`
|
| 101 |
+
- `dashboard` → Upload `dashboard.py`
|
| 102 |
+
|
| 103 |
+
---
|
| 104 |
+
|
| 105 |
+
## 📁 Folder Structure on HuggingFace Spaces
|
| 106 |
+
|
| 107 |
+
Your Space should look like:
|
| 108 |
+
|
| 109 |
+
```
|
| 110 |
+
your-space/
|
| 111 |
+
├── app.py
|
| 112 |
+
├── llm.py
|
| 113 |
+
├── extractors.py
|
| 114 |
+
├── tagging.py
|
| 115 |
+
├── chunking.py
|
| 116 |
+
├── validation.py
|
| 117 |
+
├── reporting.py
|
| 118 |
+
├── dashboard.py
|
| 119 |
+
├── production_logger.py
|
| 120 |
+
├── quote_extractor.py (optional)
|
| 121 |
+
├── requirements.txt
|
| 122 |
+
└── README.md (optional)
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
**Do NOT create subdirectories** - keep all Python files in the root.
|
| 126 |
+
|
| 127 |
+
---
|
| 128 |
+
|
| 129 |
+
## 🚀 Quick Upload Checklist
|
| 130 |
+
|
| 131 |
+
Before uploading to Spaces:
|
| 132 |
+
|
| 133 |
+
- [ ] `app.py` - Main file
|
| 134 |
+
- [ ] All imported modules (llm, extractors, etc.)
|
| 135 |
+
- [ ] `requirements.txt` - Dependencies
|
| 136 |
+
- [ ] Selected **GPU** hardware in Spaces settings
|
| 137 |
+
- [ ] No `.env` file included
|
| 138 |
+
- [ ] No test/debug files included
|
| 139 |
+
|
| 140 |
+
---
|
| 141 |
+
|
| 142 |
+
## 🔧 Troubleshooting Import Errors
|
| 143 |
+
|
| 144 |
+
### Error: `ModuleNotFoundError: No module named 'quote_extractor'`
|
| 145 |
+
**Fixed!** This is now optional - app will work without it.
|
| 146 |
+
|
| 147 |
+
### Error: `ModuleNotFoundError: No module named 'extractors'`
|
| 148 |
+
**Solution:** Upload `extractors.py`
|
| 149 |
+
|
| 150 |
+
### Error: `ModuleNotFoundError: No module named 'production_logger'`
|
| 151 |
+
**Solution:** Upload `production_logger.py`
|
| 152 |
+
|
| 153 |
+
### Error: `ModuleNotFoundError: No module named 'transformers'`
|
| 154 |
+
**Solution:** Check `requirements.txt` is uploaded and correct
|
| 155 |
+
|
| 156 |
+
---
|
| 157 |
+
|
| 158 |
+
## 📝 Alternative: Use Git Repository
|
| 159 |
+
|
| 160 |
+
Instead of manual upload, you can:
|
| 161 |
+
|
| 162 |
+
1. Create a Git repository with only required files
|
| 163 |
+
2. Connect it to your HuggingFace Space
|
| 164 |
+
3. Auto-deploy on push
|
| 165 |
+
|
| 166 |
+
**Create `.gitignore` to exclude:**
|
| 167 |
+
```
|
| 168 |
+
.env
|
| 169 |
+
*.log
|
| 170 |
+
logs/
|
| 171 |
+
outputs/
|
| 172 |
+
__pycache__/
|
| 173 |
+
test_*.py
|
| 174 |
+
debug_*.py
|
| 175 |
+
*.pyc
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
---
|
| 179 |
+
|
| 180 |
+
## Last Updated
|
| 181 |
+
October 2025
|
SIMPLE_UPLOAD_LIST.txt
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
================================================================================
|
| 2 |
+
HUGGINGFACE SPACES - FILES TO UPLOAD
|
| 3 |
+
================================================================================
|
| 4 |
+
|
| 5 |
+
Just upload these 11 files to your Space:
|
| 6 |
+
|
| 7 |
+
1. app.py ← MAIN FILE (required by HF Spaces)
|
| 8 |
+
2. llm.py
|
| 9 |
+
3. extractors.py
|
| 10 |
+
4. tagging.py
|
| 11 |
+
5. chunking.py
|
| 12 |
+
6. validation.py
|
| 13 |
+
7. reporting.py
|
| 14 |
+
8. dashboard.py
|
| 15 |
+
9. production_logger.py
|
| 16 |
+
10. quote_extractor.py
|
| 17 |
+
11. requirements.txt
|
| 18 |
+
|
| 19 |
+
================================================================================
|
| 20 |
+
SPACE SETTINGS
|
| 21 |
+
================================================================================
|
| 22 |
+
|
| 23 |
+
SDK: Gradio
|
| 24 |
+
Hardware: GPU (T4) ← IMPORTANT! Don't use CPU
|
| 25 |
+
|
| 26 |
+
================================================================================
|
| 27 |
+
THAT'S IT!
|
| 28 |
+
================================================================================
|
| 29 |
+
|
| 30 |
+
No terminal commands needed.
|
| 31 |
+
No .env file needed.
|
| 32 |
+
No configuration needed.
|
| 33 |
+
|
| 34 |
+
Just upload the 11 files and it works!
|
| 35 |
+
|
| 36 |
+
================================================================================
|
UPLOAD_TO_SPACES_CHECKLIST.md
ADDED
|
@@ -0,0 +1,196 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# HuggingFace Spaces Upload Checklist
|
| 2 |
+
|
| 3 |
+
## ✅ Pre-Upload Checklist
|
| 4 |
+
|
| 5 |
+
Your app is ready! Just upload these files:
|
| 6 |
+
|
| 7 |
+
### Required Files (Check off as you upload)
|
| 8 |
+
|
| 9 |
+
- [ ] `app.py` ← **MAIN FILE - HuggingFace Spaces needs this exact name**
|
| 10 |
+
- [ ] `llm.py`
|
| 11 |
+
- [ ] `extractors.py`
|
| 12 |
+
- [ ] `tagging.py`
|
| 13 |
+
- [ ] `chunking.py`
|
| 14 |
+
- [ ] `validation.py`
|
| 15 |
+
- [ ] `reporting.py`
|
| 16 |
+
- [ ] `dashboard.py`
|
| 17 |
+
- [ ] `production_logger.py`
|
| 18 |
+
- [ ] `quote_extractor.py`
|
| 19 |
+
- [ ] `requirements.txt`
|
| 20 |
+
|
| 21 |
+
**Total: 11 files**
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## 🚫 DO NOT Upload
|
| 26 |
+
|
| 27 |
+
- ❌ `.env` file
|
| 28 |
+
- ❌ `test_*.py` files
|
| 29 |
+
- ❌ `*.log` files
|
| 30 |
+
- ❌ `logs/` folder
|
| 31 |
+
- ❌ `outputs/` folder
|
| 32 |
+
- ❌ `__pycache__/` folder
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
## 🎯 Upload Steps
|
| 37 |
+
|
| 38 |
+
### 1. Create Your Space
|
| 39 |
+
1. Go to: https://huggingface.co/new-space
|
| 40 |
+
2. Enter a name (e.g., `transcriptor-ai`)
|
| 41 |
+
3. Choose **Gradio** as SDK
|
| 42 |
+
4. Select **GPU** hardware (T4 minimum) ⚠️ **IMPORTANT!**
|
| 43 |
+
5. Click "Create Space"
|
| 44 |
+
|
| 45 |
+
### 2. Upload Files
|
| 46 |
+
|
| 47 |
+
**Method A: Drag & Drop**
|
| 48 |
+
1. Click "Files" tab in your Space
|
| 49 |
+
2. Click "Upload files"
|
| 50 |
+
3. Drag all 11 files from the checklist above
|
| 51 |
+
4. Click "Commit"
|
| 52 |
+
|
| 53 |
+
**Method B: Git Repository**
|
| 54 |
+
1. Create a new Git repo
|
| 55 |
+
2. Copy the 11 files above
|
| 56 |
+
3. Add `.gitignore` (already created for you)
|
| 57 |
+
4. Push to repo
|
| 58 |
+
5. Connect repo to Space in Settings
|
| 59 |
+
|
| 60 |
+
### 3. Configure Space (Optional)
|
| 61 |
+
|
| 62 |
+
Go to **Settings → Variables** and add (all optional):
|
| 63 |
+
|
| 64 |
+
| Variable | Value | Why |
|
| 65 |
+
|----------|-------|-----|
|
| 66 |
+
| `DEBUG_MODE` | `True` | See detailed logs |
|
| 67 |
+
| `LLM_TEMPERATURE` | `0.7` | Already the default |
|
| 68 |
+
|
| 69 |
+
**You don't need to configure anything** - it works out of the box!
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
## ⏱️ What to Expect
|
| 74 |
+
|
| 75 |
+
### First Startup
|
| 76 |
+
1. **Installing dependencies:** 2-5 minutes
|
| 77 |
+
2. **Downloading Phi-3-mini model:** 2-5 minutes
|
| 78 |
+
3. **Total:** ~5-10 minutes
|
| 79 |
+
|
| 80 |
+
Watch the **Logs** tab - you'll see:
|
| 81 |
+
```
|
| 82 |
+
Installing dependencies...
|
| 83 |
+
✅ Configuration loaded for HuggingFace Spaces
|
| 84 |
+
🚀 TranscriptorAI Enterprise - LLM Backend: local
|
| 85 |
+
[Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
|
| 86 |
+
Downloading model files...
|
| 87 |
+
[Local Model] ✅ Model loaded on cuda:0
|
| 88 |
+
Running on local URL: http://0.0.0.0:7860
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
### Subsequent Startups
|
| 92 |
+
- **Only 30-60 seconds** (model is cached)
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
## ✅ Verify It's Working
|
| 97 |
+
|
| 98 |
+
### 1. Check Startup Logs
|
| 99 |
+
|
| 100 |
+
Look for these lines in the Logs tab:
|
| 101 |
+
|
| 102 |
+
✅ `Configuration loaded for HuggingFace Spaces`
|
| 103 |
+
✅ `LLM Backend: local`
|
| 104 |
+
✅ `Model loaded on cuda:0` ← GPU confirmed!
|
| 105 |
+
✅ `Running on local URL`
|
| 106 |
+
|
| 107 |
+
### 2. Test with Sample
|
| 108 |
+
|
| 109 |
+
1. Click "Upload Files"
|
| 110 |
+
2. Upload a DOCX transcript
|
| 111 |
+
3. Select "HCP" as interviewee type
|
| 112 |
+
4. Click "Analyze Transcripts"
|
| 113 |
+
5. Wait 5-10 minutes for processing
|
| 114 |
+
|
| 115 |
+
**Expected Result:**
|
| 116 |
+
- Quality Score: 0.7-1.0 (not 0.00!)
|
| 117 |
+
- CSV and PDF downloads available
|
| 118 |
+
- Dashboard shows charts
|
| 119 |
+
|
| 120 |
+
---
|
| 121 |
+
|
| 122 |
+
## 🐛 Common Issues
|
| 123 |
+
|
| 124 |
+
### Issue: `ModuleNotFoundError: No module named 'xyz'`
|
| 125 |
+
**Solution:** Upload the missing `xyz.py` file
|
| 126 |
+
|
| 127 |
+
### Issue: Very slow or hangs
|
| 128 |
+
**Check:** Did you select GPU hardware?
|
| 129 |
+
1. Go to Settings
|
| 130 |
+
2. Under Hardware, choose "GPU (T4)"
|
| 131 |
+
3. Restart Space
|
| 132 |
+
|
| 133 |
+
### Issue: Quality Score 0.00
|
| 134 |
+
**Solution:**
|
| 135 |
+
1. Add Variable: `DEBUG_MODE=True`
|
| 136 |
+
2. Check logs for error messages
|
| 137 |
+
3. Look for "[Local Model] ✅ Generated" to confirm it's working
|
| 138 |
+
|
| 139 |
+
### Issue: Out of memory
|
| 140 |
+
**Solution:**
|
| 141 |
+
1. Add Variable: `LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0`
|
| 142 |
+
2. OR upgrade to larger GPU
|
| 143 |
+
|
| 144 |
+
---
|
| 145 |
+
|
| 146 |
+
## 💰 Cost
|
| 147 |
+
|
| 148 |
+
### Free Tier (CPU)
|
| 149 |
+
- ⚠️ Very slow (10+ minutes per transcript)
|
| 150 |
+
- Not recommended
|
| 151 |
+
|
| 152 |
+
### GPU (T4) - ~$0.60/hour
|
| 153 |
+
- ✅ Recommended
|
| 154 |
+
- Fast processing (~5-10 min per transcript)
|
| 155 |
+
- Space sleeps after inactivity (saves money)
|
| 156 |
+
- Only charged when active
|
| 157 |
+
|
| 158 |
+
---
|
| 159 |
+
|
| 160 |
+
## 📋 Quick Reference
|
| 161 |
+
|
| 162 |
+
**Space must have:**
|
| 163 |
+
- `app.py` as main file ✅ (already correct)
|
| 164 |
+
- `requirements.txt` with dependencies ✅ (already correct)
|
| 165 |
+
- GPU hardware selected ⚠️ (you must select this)
|
| 166 |
+
|
| 167 |
+
**No .env file needed** - everything configured in code ✅
|
| 168 |
+
|
| 169 |
+
**No terminal commands needed** - all automatic ✅
|
| 170 |
+
|
| 171 |
+
---
|
| 172 |
+
|
| 173 |
+
## 🎉 Ready to Deploy!
|
| 174 |
+
|
| 175 |
+
1. ✅ Check you have all 11 files
|
| 176 |
+
2. ✅ Create Space with GPU hardware
|
| 177 |
+
3. ✅ Upload files via drag & drop
|
| 178 |
+
4. ✅ Wait for build (watch Logs tab)
|
| 179 |
+
5. ✅ Test with a transcript
|
| 180 |
+
|
| 181 |
+
**See `FILES_TO_UPLOAD.txt` for the complete list of files.**
|
| 182 |
+
|
| 183 |
+
---
|
| 184 |
+
|
| 185 |
+
## 📞 Still Stuck?
|
| 186 |
+
|
| 187 |
+
Common causes:
|
| 188 |
+
1. **Forgot to upload a file** - Check all 11 files are uploaded
|
| 189 |
+
2. **Selected CPU instead of GPU** - Change in Settings
|
| 190 |
+
3. **Uploaded .env file** - Delete it, not needed on Spaces
|
| 191 |
+
|
| 192 |
+
---
|
| 193 |
+
|
| 194 |
+
**Last Updated:** October 2025
|
| 195 |
+
|
| 196 |
+
**You're ready - just upload the 11 files and you're done!** 🚀
|
app.py
CHANGED
|
@@ -9,9 +9,19 @@ from llm import query_llm, extract_structured_data
|
|
| 9 |
from reporting import generate_enhanced_csv, generate_enhanced_pdf
|
| 10 |
from dashboard import generate_comprehensive_dashboard
|
| 11 |
from validation import validate_transcript_quality, check_data_completeness
|
| 12 |
-
from quote_extractor import extract_quotes_from_results
|
| 13 |
from production_logger import init_session, ProductionLogger, PerformanceMonitor
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
# Optional imports for enhanced validation (may not exist in older deployments)
|
| 16 |
try:
|
| 17 |
from validation import verify_consensus_claims, validate_summary_quality
|
|
|
|
| 9 |
from reporting import generate_enhanced_csv, generate_enhanced_pdf
|
| 10 |
from dashboard import generate_comprehensive_dashboard
|
| 11 |
from validation import validate_transcript_quality, check_data_completeness
|
|
|
|
| 12 |
from production_logger import init_session, ProductionLogger, PerformanceMonitor
|
| 13 |
|
| 14 |
+
# Optional: Quote extraction for market research storytelling
|
| 15 |
+
try:
|
| 16 |
+
from quote_extractor import extract_quotes_from_results
|
| 17 |
+
HAS_QUOTE_EXTRACTION = True
|
| 18 |
+
except ImportError:
|
| 19 |
+
HAS_QUOTE_EXTRACTION = False
|
| 20 |
+
print("⚠️ Quote extraction not available - reports will not include storytelling quotes")
|
| 21 |
+
def extract_quotes_from_results(results, interviewee_type):
|
| 22 |
+
"""Stub function when quote_extractor is not available"""
|
| 23 |
+
return {"quotes": [], "themes": {}, "top_quotes": []}
|
| 24 |
+
|
| 25 |
# Optional imports for enhanced validation (may not exist in older deployments)
|
| 26 |
try:
|
| 27 |
from validation import verify_consensus_claims, validate_summary_quality
|
prepare_for_spaces.py
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Prepare files for HuggingFace Spaces deployment
|
| 4 |
+
Copies only the required files to a clean directory
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import os
|
| 8 |
+
import shutil
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
|
| 11 |
+
# Required files for HuggingFace Spaces
|
| 12 |
+
REQUIRED_FILES = [
|
| 13 |
+
# Core application
|
| 14 |
+
'app.py',
|
| 15 |
+
|
| 16 |
+
# Processing modules
|
| 17 |
+
'llm.py',
|
| 18 |
+
'extractors.py',
|
| 19 |
+
'tagging.py',
|
| 20 |
+
'chunking.py',
|
| 21 |
+
'validation.py',
|
| 22 |
+
'reporting.py',
|
| 23 |
+
'dashboard.py',
|
| 24 |
+
'production_logger.py',
|
| 25 |
+
|
| 26 |
+
# Optional but recommended
|
| 27 |
+
'quote_extractor.py',
|
| 28 |
+
|
| 29 |
+
# Configuration
|
| 30 |
+
'requirements.txt',
|
| 31 |
+
|
| 32 |
+
# Documentation (optional)
|
| 33 |
+
'README.md',
|
| 34 |
+
'HUGGINGFACE_SPACES_SETUP.md',
|
| 35 |
+
]
|
| 36 |
+
|
| 37 |
+
def prepare_deployment(output_dir='./spaces_deployment'):
|
| 38 |
+
"""Copy required files to deployment directory"""
|
| 39 |
+
|
| 40 |
+
# Create output directory
|
| 41 |
+
output_path = Path(output_dir)
|
| 42 |
+
if output_path.exists():
|
| 43 |
+
print(f"⚠️ Directory {output_dir} already exists")
|
| 44 |
+
response = input("Delete and recreate? (y/n): ")
|
| 45 |
+
if response.lower() != 'y':
|
| 46 |
+
print("❌ Cancelled")
|
| 47 |
+
return
|
| 48 |
+
shutil.rmtree(output_path)
|
| 49 |
+
|
| 50 |
+
output_path.mkdir(exist_ok=True)
|
| 51 |
+
print(f"📁 Created directory: {output_dir}\n")
|
| 52 |
+
|
| 53 |
+
# Copy files
|
| 54 |
+
copied = []
|
| 55 |
+
missing = []
|
| 56 |
+
|
| 57 |
+
for filename in REQUIRED_FILES:
|
| 58 |
+
src = Path(filename)
|
| 59 |
+
if src.exists():
|
| 60 |
+
dst = output_path / filename
|
| 61 |
+
shutil.copy2(src, dst)
|
| 62 |
+
size_kb = src.stat().st_size / 1024
|
| 63 |
+
print(f" ✅ {filename} ({size_kb:.1f} KB)")
|
| 64 |
+
copied.append(filename)
|
| 65 |
+
else:
|
| 66 |
+
print(f" ⚠️ {filename} - NOT FOUND (skipping)")
|
| 67 |
+
missing.append(filename)
|
| 68 |
+
|
| 69 |
+
# Summary
|
| 70 |
+
print("\n" + "="*80)
|
| 71 |
+
print("📊 SUMMARY")
|
| 72 |
+
print("="*80)
|
| 73 |
+
print(f"✅ Copied: {len(copied)} files")
|
| 74 |
+
if missing:
|
| 75 |
+
print(f"⚠️ Missing: {len(missing)} files")
|
| 76 |
+
print(f" {', '.join(missing)}")
|
| 77 |
+
|
| 78 |
+
print(f"\n📦 Deployment files ready in: {output_dir}/")
|
| 79 |
+
print("\n📋 Next steps:")
|
| 80 |
+
print("1. Go to https://huggingface.co/new-space")
|
| 81 |
+
print("2. Select Gradio SDK and GPU hardware")
|
| 82 |
+
print("3. Upload all files from the deployment directory")
|
| 83 |
+
print("4. Wait for model download (~2-5 min first time)")
|
| 84 |
+
print("5. Test your Space!")
|
| 85 |
+
|
| 86 |
+
# Check for .env file (should not be included)
|
| 87 |
+
if (output_path / '.env').exists():
|
| 88 |
+
print("\n⚠️ WARNING: .env file found in deployment directory!")
|
| 89 |
+
print(" This should NOT be deployed to HuggingFace Spaces")
|
| 90 |
+
os.remove(output_path / '.env')
|
| 91 |
+
print(" ✅ Removed .env file")
|
| 92 |
+
|
| 93 |
+
print("\n✨ Deployment package ready!")
|
| 94 |
+
|
| 95 |
+
if __name__ == '__main__':
|
| 96 |
+
prepare_deployment()
|