Spaces:

empirenexus
/

TranscriptWriting

Paused

App Files Files Community

jmisak commited on Oct 25, 2025

Commit

56589d3

verified ·

1 Parent(s): 57fa449

Upload 13 files

Browse files

Files changed (8) hide show

.gitignore +55 -0
DEPLOY_TO_SPACES.md +184 -0
FILES_TO_UPLOAD.txt +83 -0
REQUIRED_FILES_FOR_SPACES.md +181 -0
SIMPLE_UPLOAD_LIST.txt +36 -0
UPLOAD_TO_SPACES_CHECKLIST.md +196 -0
app.py +11 -1
prepare_for_spaces.py +96 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,55 @@

+# HuggingFace Spaces Deployment - DO NOT UPLOAD THESE
+# Environment and secrets
+.env
+*.env
+# Logs
+*.log
+logs/
+session_*.log
+summary_*.txt
+summary_*.json
+# Outputs
+outputs/
+*.csv
+*.pdf
+spaces_deployment/
+# Python
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+.Python
+*.so
+*.egg
+*.egg-info/
+dist/
+build/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Test files
+test_*.py
+debug_*.py
+check_*.py
+verify_*.py
+fix_*.py
+patch_*.py
+create_sample_*.py
+update_*.py
+# Documentation (optional - you can upload if you want)
+*.md
+!README.md
+# OS
+.DS_Store
+Thumbs.db

DEPLOY_TO_SPACES.md ADDED Viewed

	@@ -0,0 +1,184 @@

+# Deploy to HuggingFace Spaces - Quick Start
+## ✅ Issue Fixed
+**The `quote_extractor` import error has been fixed!** The app will now work even if the file is missing.
+---
+## 🚀 Option 1: Automated Preparation (Recommended)
+Run this script to prepare a clean deployment package:
+```bash
+python prepare_for_spaces.py
+```
+This will:
+- Create a `spaces_deployment/` directory
+- Copy only the required files
+- Remove any .env or test files
+- Show you a summary of what's included
+Then upload everything from `spaces_deployment/` to your Space.
+---
+## 📋 Option 2: Manual Upload
+Upload these files to your HuggingFace Space:
+### Required Files (Must have)
+```
+app.py
+llm.py
+extractors.py
+tagging.py
+chunking.py
+validation.py
+reporting.py
+dashboard.py
+production_logger.py
+quote_extractor.py
+requirements.txt
+```
+### Optional Files
+```
+README.md
+HUGGINGFACE_SPACES_SETUP.md
+```
+**DO NOT upload:**
+- `.env` file
+- `test_*.py` files
+- `logs/` directory
+- `outputs/` directory
+---
+## 🔧 Space Configuration
+### 1. Create Space
+- Go to https://huggingface.co/new-space
+- Name: `transcriptor-ai` (or your choice)
+- SDK: **Gradio**
+- Hardware: **GPU (T4 or better)** ← Important!
+### 2. Upload Files
+- Drag and drop all files from the list above
+- OR connect a Git repository
+### 3. Configure (Optional)
+Go to **Settings → Variables** and add:
+| Variable | Value | When to Use |
+|----------|-------|-------------|
+| `DEBUG_MODE` | `True` | To see detailed logs |
+| `LOCAL_MODEL` | `TinyLlama/TinyLlama-1.1B-Chat-v1.0` | For faster (but lower quality) processing |
+| `LLM_TEMPERATURE` | `0.5` | For more deterministic outputs |
+**Note:** All settings have defaults - you don't need to configure anything!
+---
+## ⏱️ First Deployment
+### What to Expect
+1. **Build time:** 2-5 minutes (installing dependencies)
+2. **Model download:** 2-5 minutes (first time only - downloads Phi-3-mini)
+3. **Subsequent starts:** 30-60 seconds
+### Watch the Logs
+Click **Logs** tab to see:
+```
+✅ Configuration loaded for HuggingFace Spaces
+🚀 TranscriptorAI Enterprise - LLM Backend: local
+[Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
+Downloading (…)lve/main/config.json: 100%
+[Local Model] ✅ Model loaded on cuda:0
+Running on local URL:  http://0.0.0.0:7860
+```
+---
+## 🧪 Test Your Space
+1. Wait for "Running on local URL" message
+2. Upload a sample transcript (DOCX or PDF)
+3. Select "HCP" as interviewee type
+4. Click "Analyze Transcripts"
+**Expected:**
+- Processing time: 5-10 minutes (depending on transcript length)
+- Quality score: 0.7-1.0
+- CSV and PDF downloads available
+---
+## 🐛 Troubleshooting
+### Error: `ModuleNotFoundError: No module named 'quote_extractor'`
+**Status:** ✅ FIXED - This is now optional
+### Error: `ModuleNotFoundError: No module named 'xyz'`
+**Solution:** Upload the missing `xyz.py` file
+### Error: `CUDA out of memory`
+**Solution:**
+- Change model: Add Variable `LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0`
+- OR upgrade to larger GPU
+### Error: Very slow processing
+**Check:**
+- Is GPU hardware selected? (Not CPU)
+- Look for "Model loaded on cuda:0" in logs
+- If you see "cpu", upgrade to GPU tier
+### Quality Score still 0.00
+**Debug:**
+1. Set `DEBUG_MODE=True` in Variables
+2. Check logs for "[Local Model] ✅ Generated X characters"
+3. Look for "[LLM Debug] Successfully extracted JSON"
+4. If you see `[Error]` messages, share them
+---
+## 💡 Tips
+### Reduce Costs
+- Space sleeps after 48h inactivity (free)
+- Only pays for GPU time when active
+- ~$0.60/hour for T4 GPU
+### Improve Speed
+- Use smaller model (TinyLlama)
+- Reduce max tokens (edit llm.py line 410)
+- Process fewer chunks
+### Improve Quality
+- Use larger model (Mistral-7B)
+- Increase temperature for creative outputs
+- Keep default Phi-3-mini for best balance
+---
+## 📞 Need Help?
+1. **Check logs first** - Most issues show clear error messages
+2. **Read HUGGINGFACE_SPACES_SETUP.md** - Detailed troubleshooting
+3. **Test locally first** - Run `python test_local_model.py`
+---
+## ✨ You're Ready!
+Run the preparation script:
+```bash
+python prepare_for_spaces.py
+```
+Then upload to HuggingFace Spaces and you're done! 🎉
+---
+**Last Updated:** October 2025

FILES_TO_UPLOAD.txt ADDED Viewed

	@@ -0,0 +1,83 @@

+===============================================================================
+FILES TO UPLOAD TO HUGGINGFACE SPACES
+===============================================================================
+✅ COPY THESE FILES TO YOUR SPACE (11 files total):
+1.  app.py                    - Main application (REQUIRED - HF Spaces entry point)
+2.  llm.py                    - LLM inference with local models
+3.  extractors.py             - Document text extraction (DOCX/PDF)
+4.  tagging.py                - Speaker tagging
+5.  chunking.py               - Text chunking
+6.  validation.py             - Quality validation
+7.  reporting.py              - CSV/PDF report generation
+8.  dashboard.py              - Dashboard generation
+9.  production_logger.py      - Session logging
+10. quote_extractor.py        - Quote extraction (optional but recommended)
+11. requirements.txt          - Python dependencies
+===============================================================================
+OPTIONAL - NICE TO HAVE:
+===============================================================================
+- README.md                   - Documentation for your Space
+===============================================================================
+DO NOT UPLOAD:
+===============================================================================
+❌ .env                        - Contains secrets (use Spaces Variables instead)
+❌ test_*.py                   - Test files
+❌ *.log                       - Log files
+❌ logs/                       - Log directory
+❌ outputs/                    - Output directory
+❌ __pycache__/                - Python cache
+===============================================================================
+HUGGINGFACE SPACES SETTINGS:
+===============================================================================
+Space SDK:       Gradio
+Hardware:        GPU (T4 or better) ⚠️ IMPORTANT - CPU will be very slow!
+Optional Variables (Settings → Variables):
+- DEBUG_MODE = True              (to see detailed logs)
+- LOCAL_MODEL = microsoft/Phi-3-mini-4k-instruct   (default, no need to set)
+===============================================================================
+DEPLOYMENT METHOD:
+===============================================================================
+Option 1: Direct Upload
+- Go to your Space → Files → Upload files
+- Drag and drop the 11 files above
+Option 2: Git Repository
+- Create a Git repo with these files
+- Add .gitignore (already created)
+- Connect repo to your Space
+- Auto-deploys on push
+===============================================================================
+FIRST TIME STARTUP:
+===============================================================================
+1. Dependencies install: ~2-5 minutes
+2. Model download: ~2-5 minutes (Phi-3-mini downloads automatically)
+3. Total first startup: ~5-10 minutes
+Subsequent starts: ~30-60 seconds (model is cached)
+===============================================================================
+VERIFICATION:
+===============================================================================
+Check the Logs tab - you should see:
+✅ Configuration loaded for HuggingFace Spaces
+🚀 TranscriptorAI Enterprise - LLM Backend: local
+[Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
+[Local Model] ✅ Model loaded on cuda:0
+Running on local URL:  http://0.0.0.0:7860
+===============================================================================

REQUIRED_FILES_FOR_SPACES.md ADDED Viewed

	@@ -0,0 +1,181 @@

+# Required Files for HuggingFace Spaces Deployment
+## ✅ CRITICAL - Must Upload These Files
+### Main Application
+- `app.py` - Main Gradio application
+### Core Processing Modules
+- `llm.py` - LLM inference (local model support)
+- `extractors.py` - DOCX/PDF text extraction
+- `tagging.py` - Speaker identification
+- `chunking.py` - Semantic text chunking
+- `validation.py` - Quality scoring and validation
+- `reporting.py` - CSV/PDF report generation
+- `dashboard.py` - Dashboard generation
+- `production_logger.py` - Session logging
+### Optional but Recommended
+- `quote_extractor.py` - Market research quote extraction (now optional)
+### Configuration
+- `requirements.txt` - Python dependencies
+- `README.md` - Documentation (optional but good practice)
+---
+## ❌ DO NOT Upload These Files
+### Local Development Only
+- `.env` - Contains local secrets (use Spaces Variables instead)
+- `*.log` - Log files
+- `logs/` - Log directory
+- `outputs/` - Output directory
+- `__pycache__/` - Python cache
+- `.git/` - Git repository
+### Test Files (Not Needed)
+- `test_*.py` - All test scripts
+- `check_*.py` - Check scripts
+- `debug_*.py` - Debug scripts
+- `verify_*.py` - Verification scripts
+- `fix_*.py` - Fix scripts
+- `patch_*.py` - Patch scripts
+- `create_sample_*.py` - Sample creation
+### Documentation (Optional)
+- `*.md` files - Helpful but not required for app to run
+- You can upload them if you want documentation in your Space
+---
+## 📦 Minimal File List (Absolute Minimum)
+If you want the smallest deployment, upload only these:
+```
+app.py
+llm.py
+extractors.py
+tagging.py
+chunking.py
+validation.py
+reporting.py
+dashboard.py
+production_logger.py
+requirements.txt
+```
+**Quote extraction will be disabled** but everything else will work.
+---
+## 📋 Complete File List (Recommended)
+Upload all core files plus quote extraction:
+```
+app.py
+llm.py
+extractors.py
+tagging.py
+chunking.py
+validation.py
+reporting.py
+dashboard.py
+production_logger.py
+quote_extractor.py
+requirements.txt
+README.md (optional)
+```
+---
+## 🔍 How to Check What's Missing
+If you get `ModuleNotFoundError: No module named 'xyz'`, you need to upload `xyz.py`.
+**Common missing modules:**
+- `quote_extractor` → Upload `quote_extractor.py`
+- `production_logger` → Upload `production_logger.py`
+- `dashboard` → Upload `dashboard.py`
+---
+## 📁 Folder Structure on HuggingFace Spaces
+Your Space should look like:
+```
+your-space/
+├── app.py
+├── llm.py
+├── extractors.py
+├── tagging.py
+├── chunking.py
+├── validation.py
+├── reporting.py
+├── dashboard.py
+├── production_logger.py
+├── quote_extractor.py (optional)
+├── requirements.txt
+└── README.md (optional)
+```
+**Do NOT create subdirectories** - keep all Python files in the root.
+---
+## 🚀 Quick Upload Checklist
+Before uploading to Spaces:
+- [ ] `app.py` - Main file
+- [ ] All imported modules (llm, extractors, etc.)
+- [ ] `requirements.txt` - Dependencies
+- [ ] Selected **GPU** hardware in Spaces settings
+- [ ] No `.env` file included
+- [ ] No test/debug files included
+---
+## 🔧 Troubleshooting Import Errors
+### Error: `ModuleNotFoundError: No module named 'quote_extractor'`
+**Fixed!** This is now optional - app will work without it.
+### Error: `ModuleNotFoundError: No module named 'extractors'`
+**Solution:** Upload `extractors.py`
+### Error: `ModuleNotFoundError: No module named 'production_logger'`
+**Solution:** Upload `production_logger.py`
+### Error: `ModuleNotFoundError: No module named 'transformers'`
+**Solution:** Check `requirements.txt` is uploaded and correct
+---
+## 📝 Alternative: Use Git Repository
+Instead of manual upload, you can:
+1. Create a Git repository with only required files
+2. Connect it to your HuggingFace Space
+3. Auto-deploy on push
+**Create `.gitignore` to exclude:**
+```
+.env
+*.log
+logs/
+outputs/
+__pycache__/
+test_*.py
+debug_*.py
+*.pyc
+```
+---
+## Last Updated
+October 2025

SIMPLE_UPLOAD_LIST.txt ADDED Viewed

	@@ -0,0 +1,36 @@

+================================================================================
+HUGGINGFACE SPACES - FILES TO UPLOAD
+================================================================================
+Just upload these 11 files to your Space:
+  1. app.py                    ← MAIN FILE (required by HF Spaces)
+  2. llm.py
+  3. extractors.py
+  4. tagging.py
+  5. chunking.py
+  6. validation.py
+  7. reporting.py
+  8. dashboard.py
+  9. production_logger.py
+ 10. quote_extractor.py
+ 11. requirements.txt
+================================================================================
+SPACE SETTINGS
+================================================================================
+SDK:       Gradio
+Hardware:  GPU (T4) ← IMPORTANT! Don't use CPU
+================================================================================
+THAT'S IT!
+================================================================================
+No terminal commands needed.
+No .env file needed.
+No configuration needed.
+Just upload the 11 files and it works!
+================================================================================

UPLOAD_TO_SPACES_CHECKLIST.md ADDED Viewed

	@@ -0,0 +1,196 @@

+# HuggingFace Spaces Upload Checklist
+## ✅ Pre-Upload Checklist
+Your app is ready! Just upload these files:
+### Required Files (Check off as you upload)
+- [ ] `app.py` ← **MAIN FILE - HuggingFace Spaces needs this exact name**
+- [ ] `llm.py`
+- [ ] `extractors.py`
+- [ ] `tagging.py`
+- [ ] `chunking.py`
+- [ ] `validation.py`
+- [ ] `reporting.py`
+- [ ] `dashboard.py`
+- [ ] `production_logger.py`
+- [ ] `quote_extractor.py`
+- [ ] `requirements.txt`
+**Total: 11 files**
+---
+## 🚫 DO NOT Upload
+- ❌ `.env` file
+- ❌ `test_*.py` files
+- ❌ `*.log` files
+- ❌ `logs/` folder
+- ❌ `outputs/` folder
+- ❌ `__pycache__/` folder
+---
+## 🎯 Upload Steps
+### 1. Create Your Space
+1. Go to: https://huggingface.co/new-space
+2. Enter a name (e.g., `transcriptor-ai`)
+3. Choose **Gradio** as SDK
+4. Select **GPU** hardware (T4 minimum) ⚠️ **IMPORTANT!**
+5. Click "Create Space"
+### 2. Upload Files
+**Method A: Drag & Drop**
+1. Click "Files" tab in your Space
+2. Click "Upload files"
+3. Drag all 11 files from the checklist above
+4. Click "Commit"
+**Method B: Git Repository**
+1. Create a new Git repo
+2. Copy the 11 files above
+3. Add `.gitignore` (already created for you)
+4. Push to repo
+5. Connect repo to Space in Settings
+### 3. Configure Space (Optional)
+Go to **Settings → Variables** and add (all optional):
+| Variable | Value | Why |
+|----------|-------|-----|
+| `DEBUG_MODE` | `True` | See detailed logs |
+| `LLM_TEMPERATURE` | `0.7` | Already the default |
+**You don't need to configure anything** - it works out of the box!
+---
+## ⏱️ What to Expect
+### First Startup
+1. **Installing dependencies:** 2-5 minutes
+2. **Downloading Phi-3-mini model:** 2-5 minutes
+3. **Total:** ~5-10 minutes
+Watch the **Logs** tab - you'll see:
+```
+Installing dependencies...
+✅ Configuration loaded for HuggingFace Spaces
+🚀 TranscriptorAI Enterprise - LLM Backend: local
+[Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
+Downloading model files...
+[Local Model] ✅ Model loaded on cuda:0
+Running on local URL:  http://0.0.0.0:7860
+```
+### Subsequent Startups
+- **Only 30-60 seconds** (model is cached)
+---
+## ✅ Verify It's Working
+### 1. Check Startup Logs
+Look for these lines in the Logs tab:
+✅ `Configuration loaded for HuggingFace Spaces`
+✅ `LLM Backend: local`
+✅ `Model loaded on cuda:0` ← GPU confirmed!
+✅ `Running on local URL`
+### 2. Test with Sample
+1. Click "Upload Files"
+2. Upload a DOCX transcript
+3. Select "HCP" as interviewee type
+4. Click "Analyze Transcripts"
+5. Wait 5-10 minutes for processing
+**Expected Result:**
+- Quality Score: 0.7-1.0 (not 0.00!)
+- CSV and PDF downloads available
+- Dashboard shows charts
+---
+## 🐛 Common Issues
+### Issue: `ModuleNotFoundError: No module named 'xyz'`
+**Solution:** Upload the missing `xyz.py` file
+### Issue: Very slow or hangs
+**Check:** Did you select GPU hardware?
+1. Go to Settings
+2. Under Hardware, choose "GPU (T4)"
+3. Restart Space
+### Issue: Quality Score 0.00
+**Solution:**
+1. Add Variable: `DEBUG_MODE=True`
+2. Check logs for error messages
+3. Look for "[Local Model] ✅ Generated" to confirm it's working
+### Issue: Out of memory
+**Solution:**
+1. Add Variable: `LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0`
+2. OR upgrade to larger GPU
+---
+## 💰 Cost
+### Free Tier (CPU)
+- ⚠️ Very slow (10+ minutes per transcript)
+- Not recommended
+### GPU (T4) - ~$0.60/hour
+- ✅ Recommended
+- Fast processing (~5-10 min per transcript)
+- Space sleeps after inactivity (saves money)
+- Only charged when active
+---
+## 📋 Quick Reference
+**Space must have:**
+- `app.py` as main file ✅ (already correct)
+- `requirements.txt` with dependencies ✅ (already correct)
+- GPU hardware selected ⚠️ (you must select this)
+**No .env file needed** - everything configured in code ✅
+**No terminal commands needed** - all automatic ✅
+---
+## 🎉 Ready to Deploy!
+1. ✅ Check you have all 11 files
+2. ✅ Create Space with GPU hardware
+3. ✅ Upload files via drag & drop
+4. ✅ Wait for build (watch Logs tab)
+5. ✅ Test with a transcript
+**See `FILES_TO_UPLOAD.txt` for the complete list of files.**
+---
+## 📞 Still Stuck?
+Common causes:
+1. **Forgot to upload a file** - Check all 11 files are uploaded
+2. **Selected CPU instead of GPU** - Change in Settings
+3. **Uploaded .env file** - Delete it, not needed on Spaces
+---
+**Last Updated:** October 2025
+**You're ready - just upload the 11 files and you're done!** 🚀

app.py CHANGED Viewed

@@ -9,9 +9,19 @@ from llm import query_llm, extract_structured_data
 from reporting import generate_enhanced_csv, generate_enhanced_pdf
 from dashboard import generate_comprehensive_dashboard
 from validation import validate_transcript_quality, check_data_completeness
-from quote_extractor import extract_quotes_from_results
 from production_logger import init_session, ProductionLogger, PerformanceMonitor
 # Optional imports for enhanced validation (may not exist in older deployments)
 try:
     from validation import verify_consensus_claims, validate_summary_quality

 from reporting import generate_enhanced_csv, generate_enhanced_pdf
 from dashboard import generate_comprehensive_dashboard
 from validation import validate_transcript_quality, check_data_completeness
 from production_logger import init_session, ProductionLogger, PerformanceMonitor
+# Optional: Quote extraction for market research storytelling
+try:
+    from quote_extractor import extract_quotes_from_results
+    HAS_QUOTE_EXTRACTION = True
+except ImportError:
+    HAS_QUOTE_EXTRACTION = False
+    print("⚠️ Quote extraction not available - reports will not include storytelling quotes")
+    def extract_quotes_from_results(results, interviewee_type):
+        """Stub function when quote_extractor is not available"""
+        return {"quotes": [], "themes": {}, "top_quotes": []}
 # Optional imports for enhanced validation (may not exist in older deployments)
 try:
     from validation import verify_consensus_claims, validate_summary_quality

prepare_for_spaces.py ADDED Viewed

	@@ -0,0 +1,96 @@

+#!/usr/bin/env python3
+"""
+Prepare files for HuggingFace Spaces deployment
+Copies only the required files to a clean directory
+"""
+import os
+import shutil
+from pathlib import Path
+# Required files for HuggingFace Spaces
+REQUIRED_FILES = [
+    # Core application
+    'app.py',
+    # Processing modules
+    'llm.py',
+    'extractors.py',
+    'tagging.py',
+    'chunking.py',
+    'validation.py',
+    'reporting.py',
+    'dashboard.py',
+    'production_logger.py',
+    # Optional but recommended
+    'quote_extractor.py',
+    # Configuration
+    'requirements.txt',
+    # Documentation (optional)
+    'README.md',
+    'HUGGINGFACE_SPACES_SETUP.md',
+]
+def prepare_deployment(output_dir='./spaces_deployment'):
+    """Copy required files to deployment directory"""
+    # Create output directory
+    output_path = Path(output_dir)
+    if output_path.exists():
+        print(f"⚠️  Directory {output_dir} already exists")
+        response = input("Delete and recreate? (y/n): ")
+        if response.lower() != 'y':
+            print("❌ Cancelled")
+            return
+        shutil.rmtree(output_path)
+    output_path.mkdir(exist_ok=True)
+    print(f"📁 Created directory: {output_dir}\n")
+    # Copy files
+    copied = []
+    missing = []
+    for filename in REQUIRED_FILES:
+        src = Path(filename)
+        if src.exists():
+            dst = output_path / filename
+            shutil.copy2(src, dst)
+            size_kb = src.stat().st_size / 1024
+            print(f"   ✅ {filename} ({size_kb:.1f} KB)")
+            copied.append(filename)
+        else:
+            print(f"   ⚠️  {filename} - NOT FOUND (skipping)")
+            missing.append(filename)
+    # Summary
+    print("\n" + "="*80)
+    print("📊 SUMMARY")
+    print("="*80)
+    print(f"✅ Copied: {len(copied)} files")
+    if missing:
+        print(f"⚠️  Missing: {len(missing)} files")
+        print(f"   {', '.join(missing)}")
+    print(f"\n📦 Deployment files ready in: {output_dir}/")
+    print("\n📋 Next steps:")
+    print("1. Go to https://huggingface.co/new-space")
+    print("2. Select Gradio SDK and GPU hardware")
+    print("3. Upload all files from the deployment directory")
+    print("4. Wait for model download (~2-5 min first time)")
+    print("5. Test your Space!")
+    # Check for .env file (should not be included)
+    if (output_path / '.env').exists():
+        print("\n⚠️  WARNING: .env file found in deployment directory!")
+        print("   This should NOT be deployed to HuggingFace Spaces")
+        os.remove(output_path / '.env')
+        print("   ✅ Removed .env file")
+    print("\n✨ Deployment package ready!")
+if __name__ == '__main__':
+    prepare_deployment()