Spaces:
Sleeping
Sleeping
Upload 6 files
Browse files- HUGGINGFACE_SPACES_SETUP.md +225 -0
- MIGRATION_TO_LOCAL_MODELS.md +277 -0
- app.py +262 -73
- llm.py +53 -22
- requirements.txt +48 -14
- test_local_model.py +138 -0
HUGGINGFACE_SPACES_SETUP.md
ADDED
|
@@ -0,0 +1,225 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# HuggingFace Spaces Deployment Guide
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
This application is configured to run on **HuggingFace Spaces** using local model inference (no external API calls required).
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## Quick Setup
|
| 9 |
+
|
| 10 |
+
### 1. Create a New Space
|
| 11 |
+
1. Go to https://huggingface.co/new-space
|
| 12 |
+
2. Choose **Gradio** as the SDK
|
| 13 |
+
3. Select **GPU** hardware (T4 or better recommended)
|
| 14 |
+
4. Name your Space (e.g., `transcriptor-ai`)
|
| 15 |
+
|
| 16 |
+
### 2. Upload Your Code
|
| 17 |
+
Upload all files from this directory to your Space, or connect a Git repository.
|
| 18 |
+
|
| 19 |
+
### 3. Configure Space Settings (Optional)
|
| 20 |
+
|
| 21 |
+
Go to **Settings โ Variables** in your Space and add:
|
| 22 |
+
|
| 23 |
+
| Variable | Value | Description |
|
| 24 |
+
|----------|-------|-------------|
|
| 25 |
+
| `DEBUG_MODE` | `True` or `False` | Enable detailed logging |
|
| 26 |
+
| `LLM_TEMPERATURE` | `0.7` | Model creativity (0.0-1.0) |
|
| 27 |
+
| `LLM_TIMEOUT` | `120` | Timeout in seconds |
|
| 28 |
+
| `LOCAL_MODEL` | `microsoft/Phi-3-mini-4k-instruct` | Model to use |
|
| 29 |
+
|
| 30 |
+
**Note:** All settings have sensible defaults - you don't need to set these unless you want to customize.
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## Hardware Requirements
|
| 35 |
+
|
| 36 |
+
### Recommended: GPU (T4 or better)
|
| 37 |
+
- **Phi-3-mini-4k-instruct**: 3.8B params, ~8GB GPU RAM
|
| 38 |
+
- Processing speed: ~30-60 seconds per transcript chunk
|
| 39 |
+
- **Best for:** Production use with multiple users
|
| 40 |
+
|
| 41 |
+
### Alternative: CPU (not recommended)
|
| 42 |
+
- Will work but be very slow (5-10 minutes per chunk)
|
| 43 |
+
- Only suitable for testing
|
| 44 |
+
|
| 45 |
+
---
|
| 46 |
+
|
| 47 |
+
## Supported Models
|
| 48 |
+
|
| 49 |
+
You can change the model by setting the `LOCAL_MODEL` variable:
|
| 50 |
+
|
| 51 |
+
### Small & Fast (Recommended for Free Tier)
|
| 52 |
+
```
|
| 53 |
+
LOCAL_MODEL=microsoft/Phi-3-mini-4k-instruct (Default - 3.8B params)
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
### Medium (Better quality, needs more GPU)
|
| 57 |
+
```
|
| 58 |
+
LOCAL_MODEL=mistralai/Mistral-7B-Instruct-v0.3 (7B params)
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
### Alternatives
|
| 62 |
+
```
|
| 63 |
+
LOCAL_MODEL=HuggingFaceH4/zephyr-7b-beta (7B params, good instruction following)
|
| 64 |
+
LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0 (1.1B params, very fast but lower quality)
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
---
|
| 68 |
+
|
| 69 |
+
## Configuration Files
|
| 70 |
+
|
| 71 |
+
### โ
Required Files
|
| 72 |
+
- `app.py` - Main application
|
| 73 |
+
- `requirements.txt` - Python dependencies
|
| 74 |
+
- `llm.py`, `extractors.py`, etc. - Core modules
|
| 75 |
+
|
| 76 |
+
### โ ๏ธ NOT Needed for Spaces
|
| 77 |
+
- `.env` file - Use Spaces Variables instead
|
| 78 |
+
- Local database files
|
| 79 |
+
- API keys (unless using external APIs)
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## Environment Configuration
|
| 84 |
+
|
| 85 |
+
The app automatically detects if it's running on HuggingFace Spaces and uses local model inference by default.
|
| 86 |
+
|
| 87 |
+
**Default Configuration (no .env needed):**
|
| 88 |
+
```python
|
| 89 |
+
USE_HF_API = False # Don't use HF Inference API
|
| 90 |
+
USE_LMSTUDIO = False # Don't use LM Studio
|
| 91 |
+
LLM_BACKEND = local # Use local transformers
|
| 92 |
+
DEBUG_MODE = False # Disable debug logs
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
**To override:** Set Spaces Variables (Settings โ Variables)
|
| 96 |
+
|
| 97 |
+
---
|
| 98 |
+
|
| 99 |
+
## Troubleshooting
|
| 100 |
+
|
| 101 |
+
### Issue: "Out of Memory" Error
|
| 102 |
+
**Solution:** Switch to a smaller model
|
| 103 |
+
```
|
| 104 |
+
LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
### Issue: Very Slow Processing
|
| 108 |
+
**Solution:**
|
| 109 |
+
1. Make sure you selected **GPU** hardware (not CPU)
|
| 110 |
+
2. Check Space logs for "Model loaded on cuda" confirmation
|
| 111 |
+
3. If on CPU, upgrade to GPU tier
|
| 112 |
+
|
| 113 |
+
### Issue: Quality Score 0.00
|
| 114 |
+
**Causes:**
|
| 115 |
+
1. Model not loaded properly (check logs for "[Local Model] Loading...")
|
| 116 |
+
2. GPU out of memory (model falls back to CPU)
|
| 117 |
+
3. Timeout too short (increase `LLM_TIMEOUT`)
|
| 118 |
+
|
| 119 |
+
**Debug Steps:**
|
| 120 |
+
1. Set `DEBUG_MODE=True` in Spaces Variables
|
| 121 |
+
2. Check logs for detailed error messages
|
| 122 |
+
3. Look for "[Local Model] โ
Generated X characters"
|
| 123 |
+
|
| 124 |
+
### Issue: Model Downloads Every Time
|
| 125 |
+
**Solution:** HuggingFace Spaces caches models automatically, but first load takes 2-5 minutes.
|
| 126 |
+
- Subsequent starts are faster (~30 seconds)
|
| 127 |
+
- Don't restart Space unnecessarily
|
| 128 |
+
|
| 129 |
+
---
|
| 130 |
+
|
| 131 |
+
## Performance Optimization
|
| 132 |
+
|
| 133 |
+
### 1. Reduce Context Window
|
| 134 |
+
Edit `llm.py` line 399:
|
| 135 |
+
```python
|
| 136 |
+
max_length=2000 # Reduce from 3500 for faster processing
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
### 2. Lower Token Limit
|
| 140 |
+
Set Spaces Variable:
|
| 141 |
+
```
|
| 142 |
+
MAX_TOKENS_PER_REQUEST=800 # Default is 1500
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
+
### 3. Use Smaller Model
|
| 146 |
+
```
|
| 147 |
+
LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
+
### 4. Disable Debug Mode
|
| 151 |
+
```
|
| 152 |
+
DEBUG_MODE=False
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
---
|
| 156 |
+
|
| 157 |
+
## Monitoring
|
| 158 |
+
|
| 159 |
+
### View Logs
|
| 160 |
+
1. Go to your Space
|
| 161 |
+
2. Click **Logs** tab at the top
|
| 162 |
+
3. Look for startup messages:
|
| 163 |
+
|
| 164 |
+
```
|
| 165 |
+
โ
Configuration loaded for HuggingFace Spaces
|
| 166 |
+
๐ TranscriptorAI Enterprise - LLM Backend: local
|
| 167 |
+
[Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
|
| 168 |
+
[Local Model] โ
Model loaded on cuda:0
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
### Check Processing
|
| 172 |
+
During analysis, you should see:
|
| 173 |
+
```
|
| 174 |
+
[Local Model] Generating (1500 max tokens, temp=0.7)...
|
| 175 |
+
[Local Model] โ
Generated 1247 characters
|
| 176 |
+
[LLM Debug] โ
Successfully extracted JSON with 7 fields
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
---
|
| 180 |
+
|
| 181 |
+
## Cost Estimation
|
| 182 |
+
|
| 183 |
+
### Free Tier (CPU)
|
| 184 |
+
- โ ๏ธ Very slow but free
|
| 185 |
+
- ~5-10 minutes per transcript
|
| 186 |
+
|
| 187 |
+
### GPU (T4) - ~$0.60/hour
|
| 188 |
+
- โก Fast processing
|
| 189 |
+
- ~30-60 seconds per transcript
|
| 190 |
+
- Space sleeps after inactivity (saves money)
|
| 191 |
+
|
| 192 |
+
### Persistent GPU (Upgraded)
|
| 193 |
+
- Always-on for instant access
|
| 194 |
+
- Higher cost but best user experience
|
| 195 |
+
|
| 196 |
+
---
|
| 197 |
+
|
| 198 |
+
## Security Notes
|
| 199 |
+
|
| 200 |
+
1. **No API Keys Needed:** Everything runs locally
|
| 201 |
+
2. **Private Processing:** Data never leaves your Space
|
| 202 |
+
3. **Secrets Management:** Use Spaces Secrets (not Variables) for sensitive data
|
| 203 |
+
4. **Model Access:** Phi-3 and most models don't require gated access
|
| 204 |
+
|
| 205 |
+
---
|
| 206 |
+
|
| 207 |
+
## Next Steps
|
| 208 |
+
|
| 209 |
+
1. โ
Upload code to your Space
|
| 210 |
+
2. โ
Select GPU hardware
|
| 211 |
+
3. โ
Wait for first model download (~2-5 min)
|
| 212 |
+
4. โ
Test with a sample transcript
|
| 213 |
+
5. ๐ Share your Space URL!
|
| 214 |
+
|
| 215 |
+
---
|
| 216 |
+
|
| 217 |
+
## Support
|
| 218 |
+
|
| 219 |
+
- **HuggingFace Spaces Docs:** https://huggingface.co/docs/hub/spaces
|
| 220 |
+
- **Transformers Docs:** https://huggingface.co/docs/transformers
|
| 221 |
+
- **GPU Pricing:** https://huggingface.co/pricing
|
| 222 |
+
|
| 223 |
+
---
|
| 224 |
+
|
| 225 |
+
**Last Updated:** October 2025
|
MIGRATION_TO_LOCAL_MODELS.md
ADDED
|
@@ -0,0 +1,277 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Migration to Local Models - Summary
|
| 2 |
+
|
| 3 |
+
## Problem
|
| 4 |
+
Your application was failing with **Quality Score 0.00** because:
|
| 5 |
+
1. Hardcoded configuration forced LM Studio (localhost) which wasn't running
|
| 6 |
+
2. HuggingFace API was using wrong model (opt-125m instead of Phi-3)
|
| 7 |
+
3. Configuration designed for API calls, not local inference
|
| 8 |
+
4. .env files don't work on HuggingFace Spaces
|
| 9 |
+
|
| 10 |
+
## Solution
|
| 11 |
+
Migrated to **local model inference** optimized for HuggingFace Spaces.
|
| 12 |
+
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
## Changes Made
|
| 16 |
+
|
| 17 |
+
### 1. **app.py** - Configuration System
|
| 18 |
+
**Lines 39-63:** Removed hardcoded LM Studio config
|
| 19 |
+
- โ
Now loads .env if exists (local development)
|
| 20 |
+
- โ
Falls back to sensible defaults (HF Spaces)
|
| 21 |
+
- โ
Uses `os.environ.setdefault()` for configuration
|
| 22 |
+
- โ
No external API calls by default
|
| 23 |
+
|
| 24 |
+
**Before:**
|
| 25 |
+
```python
|
| 26 |
+
os.environ["USE_LMSTUDIO"] = "True" # Forced LM Studio
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
**After:**
|
| 30 |
+
```python
|
| 31 |
+
os.environ.setdefault("LLM_BACKEND", "local") # Local transformers
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
### 2. **llm.py** - Local Model Function
|
| 37 |
+
**Lines 364-429:** Rewrote `query_llm_local()`
|
| 38 |
+
- โ
Uses Phi-3-mini-4k-instruct (better for medical data)
|
| 39 |
+
- โ
Proper GPU/CPU detection
|
| 40 |
+
- โ
Model caching (loads once, reuses)
|
| 41 |
+
- โ
Configurable via `LOCAL_MODEL` environment variable
|
| 42 |
+
- โ
Better error handling and logging
|
| 43 |
+
|
| 44 |
+
**Before:**
|
| 45 |
+
```python
|
| 46 |
+
# Used Flan-T5-XXL (seq2seq model)
|
| 47 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-xxl")
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
**After:**
|
| 51 |
+
```python
|
| 52 |
+
# Uses Phi-3-mini (causal LM with better instruction following)
|
| 53 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 54 |
+
os.getenv("LOCAL_MODEL", "microsoft/Phi-3-mini-4k-instruct"),
|
| 55 |
+
device_map="auto"
|
| 56 |
+
)
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
### 3. **llm.py** - HF API Function (Fixed but not used by default)
|
| 62 |
+
**Lines 246-297:** Fixed for accuracy (if you decide to use API later)
|
| 63 |
+
- โ
Uses model from `HF_MODEL` environment variable
|
| 64 |
+
- โ
Full prompt (no truncation)
|
| 65 |
+
- โ
1500 tokens (not 300)
|
| 66 |
+
- โ
Respects temperature and timeout settings
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
### 4. **llm.py** - Enhanced Debugging
|
| 71 |
+
**Lines 181-239:** Added detailed logging
|
| 72 |
+
- โ
Shows response preview
|
| 73 |
+
- โ
Reports JSON extraction success/failure
|
| 74 |
+
- โ
Logs field counts and extraction method
|
| 75 |
+
- โ
Helps diagnose quality score issues
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
|
| 79 |
+
### 5. **requirements.txt** - Added Dependencies
|
| 80 |
+
**Lines 43-50:** Added transformers stack
|
| 81 |
+
```python
|
| 82 |
+
transformers>=4.36.0 # Model loading
|
| 83 |
+
torch>=2.1.0 # PyTorch backend
|
| 84 |
+
accelerate>=0.25.0 # Efficient GPU loading
|
| 85 |
+
sentencepiece>=0.1.99 # Tokenizer support
|
| 86 |
+
protobuf>=3.20.0 # Tokenizer dependencies
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
---
|
| 90 |
+
|
| 91 |
+
## New Files Created
|
| 92 |
+
|
| 93 |
+
### ๐ HUGGINGFACE_SPACES_SETUP.md
|
| 94 |
+
Complete deployment guide including:
|
| 95 |
+
- Quick setup steps
|
| 96 |
+
- Hardware requirements
|
| 97 |
+
- Supported models
|
| 98 |
+
- Troubleshooting
|
| 99 |
+
- Performance optimization
|
| 100 |
+
- Cost estimation
|
| 101 |
+
|
| 102 |
+
### ๐งช test_local_model.py
|
| 103 |
+
Test script to verify setup before deployment:
|
| 104 |
+
```bash
|
| 105 |
+
python test_local_model.py
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
## Configuration Options
|
| 111 |
+
|
| 112 |
+
### Environment Variables (Spaces Settings โ Variables)
|
| 113 |
+
|
| 114 |
+
| Variable | Default | Description |
|
| 115 |
+
|----------|---------|-------------|
|
| 116 |
+
| `LLM_BACKEND` | `local` | Backend to use (`local`, `hf_api`, `lmstudio`) |
|
| 117 |
+
| `LOCAL_MODEL` | `microsoft/Phi-3-mini-4k-instruct` | Model to load |
|
| 118 |
+
| `LLM_TEMPERATURE` | `0.7` | Creativity (0.0-1.0) |
|
| 119 |
+
| `LLM_TIMEOUT` | `120` | Timeout seconds |
|
| 120 |
+
| `DEBUG_MODE` | `False` | Enable detailed logs |
|
| 121 |
+
| `USE_HF_API` | `False` | Use HF Inference API |
|
| 122 |
+
| `USE_LMSTUDIO` | `False` | Use LM Studio |
|
| 123 |
+
|
| 124 |
+
### For HuggingFace Spaces
|
| 125 |
+
**You don't need to set any variables!** Defaults work out of the box.
|
| 126 |
+
|
| 127 |
+
**Optional customization:**
|
| 128 |
+
1. Go to Space Settings โ Variables
|
| 129 |
+
2. Add `DEBUG_MODE` = `True` to see detailed logs
|
| 130 |
+
3. Add `LOCAL_MODEL` = `TinyLlama/TinyLlama-1.1B-Chat-v1.0` for faster (but lower quality)
|
| 131 |
+
|
| 132 |
+
---
|
| 133 |
+
|
| 134 |
+
## Testing Locally
|
| 135 |
+
|
| 136 |
+
### 1. Install Dependencies
|
| 137 |
+
```bash
|
| 138 |
+
pip install -r requirements.txt
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
### 2. Test Local Model
|
| 142 |
+
```bash
|
| 143 |
+
python test_local_model.py
|
| 144 |
+
```
|
| 145 |
+
|
| 146 |
+
**Expected output:**
|
| 147 |
+
```
|
| 148 |
+
๐งช Testing Local Model Inference
|
| 149 |
+
1๏ธโฃ Testing imports...
|
| 150 |
+
โ
PyTorch 2.1.0
|
| 151 |
+
๐ง CUDA available: True
|
| 152 |
+
๐ฎ GPU: NVIDIA GeForce RTX 3080
|
| 153 |
+
|
| 154 |
+
2๏ธโฃ Testing LLM function...
|
| 155 |
+
โ
LLM module imported
|
| 156 |
+
|
| 157 |
+
3๏ธโฃ Testing simple query...
|
| 158 |
+
[Local Model] Loading microsoft/Phi-3-mini-4k-instruct...
|
| 159 |
+
[Local Model] โ
Model loaded on cuda:0
|
| 160 |
+
[Local Model] Generating (1500 max tokens, temp=0.7)...
|
| 161 |
+
[Local Model] โ
Generated 847 characters
|
| 162 |
+
|
| 163 |
+
๐ RESULTS
|
| 164 |
+
โ
Response length OK (847 chars)
|
| 165 |
+
โ
Structured data extracted (3 fields)
|
| 166 |
+
โข diagnoses: 1 items
|
| 167 |
+
โข prescriptions: 2 items
|
| 168 |
+
โข treatment_rationale: 2 items
|
| 169 |
+
|
| 170 |
+
๐ TEST COMPLETE!
|
| 171 |
+
```
|
| 172 |
+
|
| 173 |
+
### 3. Run Full App
|
| 174 |
+
```bash
|
| 175 |
+
python app.py
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
---
|
| 179 |
+
|
| 180 |
+
## Deployment to HuggingFace Spaces
|
| 181 |
+
|
| 182 |
+
### Quick Start
|
| 183 |
+
1. Create new Space at https://huggingface.co/new-space
|
| 184 |
+
2. Choose **Gradio** SDK
|
| 185 |
+
3. Select **GPU** hardware (T4 minimum)
|
| 186 |
+
4. Upload all files
|
| 187 |
+
5. Wait for model download (~2-5 minutes first time)
|
| 188 |
+
6. Test with sample transcript
|
| 189 |
+
|
| 190 |
+
**See HUGGINGFACE_SPACES_SETUP.md for detailed instructions.**
|
| 191 |
+
|
| 192 |
+
---
|
| 193 |
+
|
| 194 |
+
## Model Comparison
|
| 195 |
+
|
| 196 |
+
| Model | Size | Speed | Quality | GPU RAM | Recommended For |
|
| 197 |
+
|-------|------|-------|---------|---------|-----------------|
|
| 198 |
+
| Phi-3-mini-4k | 3.8B | Fast | Excellent | ~8GB | **Default - Best balance** |
|
| 199 |
+
| TinyLlama-1.1B | 1.1B | Very Fast | Good | ~4GB | Testing, free tier |
|
| 200 |
+
| Mistral-7B | 7B | Medium | Excellent | ~14GB | Production, paid tier |
|
| 201 |
+
| Zephyr-7B | 7B | Medium | Excellent | ~14GB | Alternative to Mistral |
|
| 202 |
+
|
| 203 |
+
---
|
| 204 |
+
|
| 205 |
+
## Troubleshooting
|
| 206 |
+
|
| 207 |
+
### Issue: Quality Score Still 0.00
|
| 208 |
+
|
| 209 |
+
**Check:**
|
| 210 |
+
1. Model loaded successfully? Look for `[Local Model] โ
Model loaded on cuda:0`
|
| 211 |
+
2. Response generated? Look for `[Local Model] โ
Generated X characters`
|
| 212 |
+
3. JSON extracted? Look for `[LLM Debug] โ
Successfully extracted JSON`
|
| 213 |
+
|
| 214 |
+
**Enable debug mode:**
|
| 215 |
+
```python
|
| 216 |
+
# In Spaces: Set Variable DEBUG_MODE=True
|
| 217 |
+
# Locally: Edit .env and add DEBUG_MODE=True
|
| 218 |
+
```
|
| 219 |
+
|
| 220 |
+
### Issue: Out of Memory
|
| 221 |
+
|
| 222 |
+
**Solutions:**
|
| 223 |
+
1. Use smaller model: `LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0`
|
| 224 |
+
2. Reduce context: Edit `llm.py` line 399, set `max_length=2000`
|
| 225 |
+
3. Upgrade GPU tier in Spaces settings
|
| 226 |
+
|
| 227 |
+
### Issue: Very Slow Processing
|
| 228 |
+
|
| 229 |
+
**Check:**
|
| 230 |
+
1. Are you on GPU? Look for `cuda:0` in logs (not `cpu`)
|
| 231 |
+
2. Model cached? Second run should be faster
|
| 232 |
+
3. Right hardware selected in Spaces?
|
| 233 |
+
|
| 234 |
+
---
|
| 235 |
+
|
| 236 |
+
## Rollback (If Needed)
|
| 237 |
+
|
| 238 |
+
To revert to HuggingFace API:
|
| 239 |
+
1. Set Spaces Variable: `USE_HF_API=True`
|
| 240 |
+
2. Set Spaces Secret: `HUGGINGFACE_TOKEN=your_token`
|
| 241 |
+
3. Restart Space
|
| 242 |
+
|
| 243 |
+
---
|
| 244 |
+
|
| 245 |
+
## Performance Benchmarks
|
| 246 |
+
|
| 247 |
+
### Phi-3-mini on T4 GPU (HF Spaces)
|
| 248 |
+
- **Model Load:** 30-60 seconds (first time: 2-5 min for download)
|
| 249 |
+
- **Per Chunk:** 30-60 seconds
|
| 250 |
+
- **Full Transcript (10 chunks):** 5-10 minutes
|
| 251 |
+
- **Quality Score:** Typically 0.7-1.0
|
| 252 |
+
|
| 253 |
+
### TinyLlama on T4 GPU
|
| 254 |
+
- **Model Load:** 10-20 seconds
|
| 255 |
+
- **Per Chunk:** 15-30 seconds
|
| 256 |
+
- **Full Transcript:** 3-5 minutes
|
| 257 |
+
- **Quality Score:** Typically 0.5-0.8 (lower than Phi-3)
|
| 258 |
+
|
| 259 |
+
---
|
| 260 |
+
|
| 261 |
+
## Next Steps
|
| 262 |
+
|
| 263 |
+
1. โ
**Test Locally:** Run `python test_local_model.py`
|
| 264 |
+
2. โ
**Deploy to Spaces:** Follow HUGGINGFACE_SPACES_SETUP.md
|
| 265 |
+
3. โ
**Monitor Logs:** Check for successful model loading
|
| 266 |
+
4. โ
**Test Sample:** Upload a dermatology transcript
|
| 267 |
+
5. โ
**Optimize:** Adjust model/settings based on results
|
| 268 |
+
|
| 269 |
+
---
|
| 270 |
+
|
| 271 |
+
## Questions?
|
| 272 |
+
|
| 273 |
+
- **HuggingFace Spaces:** https://huggingface.co/docs/hub/spaces
|
| 274 |
+
- **Phi-3 Model Card:** https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
|
| 275 |
+
- **Transformers Docs:** https://huggingface.co/docs/transformers
|
| 276 |
+
|
| 277 |
+
**Last Updated:** October 2025
|
app.py
CHANGED
|
@@ -8,28 +8,84 @@ from chunking import chunk_text_semantic
|
|
| 8 |
from llm import query_llm, extract_structured_data
|
| 9 |
from reporting import generate_enhanced_csv, generate_enhanced_pdf
|
| 10 |
from dashboard import generate_comprehensive_dashboard
|
| 11 |
-
from validation import validate_transcript_quality, check_data_completeness
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
# HuggingFace Spaces Configuration
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
def analyze(files, file_type, user_comments, role_hint, debug_mode, interviewee_type, progress=gr.Progress()):
|
| 21 |
"""
|
| 22 |
-
Enhanced analysis pipeline with robust error handling and
|
| 23 |
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
os.environ["DEBUG_MODE"] = str(debug_mode)
|
| 25 |
-
|
| 26 |
if not files:
|
|
|
|
| 27 |
return "Error: No files uploaded", None, None, None
|
| 28 |
-
|
| 29 |
all_results = []
|
| 30 |
csv_rows = []
|
| 31 |
processing_errors = []
|
| 32 |
-
|
| 33 |
progress(0, desc="Initializing...")
|
| 34 |
print(f"[Start] Processing {len(files)} file(s) as {file_type}")
|
| 35 |
|
|
@@ -64,6 +120,9 @@ Additional Instructions:
|
|
| 64 |
|
| 65 |
for i, file in enumerate(files):
|
| 66 |
file_name = os.path.basename(file.name)
|
|
|
|
|
|
|
|
|
|
| 67 |
try:
|
| 68 |
# Step 1: Extract text
|
| 69 |
progress((current_step / total_steps), desc=f"Extracting {file_name}...")
|
|
@@ -102,14 +161,26 @@ Additional Instructions:
|
|
| 102 |
progress(chunk_progress, desc=f"Analyzing {file_name} ({j+1}/{len(chunks)})...")
|
| 103 |
|
| 104 |
result, chunk_data = query_llm(
|
| 105 |
-
chunk,
|
| 106 |
-
user_context,
|
| 107 |
interviewee_type,
|
| 108 |
extract_structured=True
|
| 109 |
)
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
# Merge structured data
|
| 114 |
for key, value in chunk_data.items():
|
| 115 |
if key not in structured_data:
|
|
@@ -120,9 +191,21 @@ Additional Instructions:
|
|
| 120 |
structured_data[key].append(value)
|
| 121 |
|
| 122 |
current_step += 1
|
| 123 |
-
|
| 124 |
# Combine and validate results
|
| 125 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
|
| 127 |
# Quality check
|
| 128 |
quality_score, quality_issues = validate_transcript_quality(
|
|
@@ -152,31 +235,61 @@ Additional Instructions:
|
|
| 152 |
"Word Count": len(raw_text.split()),
|
| 153 |
}
|
| 154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
# Add interviewee-specific fields
|
| 156 |
if interviewee_type == "HCP":
|
| 157 |
csv_row.update({
|
| 158 |
-
"Diagnoses":
|
| 159 |
-
"Prescriptions":
|
| 160 |
-
"Treatment Strategies":
|
| 161 |
-
"Guidelines Mentioned":
|
| 162 |
})
|
| 163 |
elif interviewee_type == "Patient":
|
| 164 |
csv_row.update({
|
| 165 |
-
"Primary Symptoms":
|
| 166 |
-
"Main Concerns":
|
| 167 |
-
"Treatment Response":
|
| 168 |
-
"Side Effects":
|
| 169 |
})
|
| 170 |
else:
|
| 171 |
csv_row.update({
|
| 172 |
-
"Key Insights":
|
| 173 |
-
"Recommendations":
|
| 174 |
})
|
| 175 |
|
| 176 |
csv_rows.append(csv_row)
|
| 177 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
print(f"[File {i+1}] โ Processing complete")
|
| 179 |
-
|
| 180 |
except Exception as e:
|
| 181 |
# Enhanced error tracking with type and traceback
|
| 182 |
import traceback
|
|
@@ -187,6 +300,10 @@ Additional Instructions:
|
|
| 187 |
error_msg = f"[{error_type}] {file_name}: {error_details}"
|
| 188 |
print(error_msg)
|
| 189 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
# Store comprehensive error information
|
| 191 |
processing_errors.append({
|
| 192 |
"transcript_id": f"Transcript {i+1}",
|
|
@@ -222,70 +339,101 @@ Additional Instructions:
|
|
| 222 |
try:
|
| 223 |
progress(0.9, desc="Generating summary and reports...")
|
| 224 |
print("[Summary] Analyzing trends across transcripts")
|
| 225 |
-
|
| 226 |
# Combine successful results
|
| 227 |
valid_results = [r for r in all_results if r["quality_score"] > 0]
|
| 228 |
-
|
| 229 |
if not valid_results:
|
| 230 |
return "Error: No transcripts were successfully processed", None, None, None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 231 |
|
| 232 |
-
# Build comprehensive summary prompt
|
| 233 |
summary_prompt = f"""
|
| 234 |
CROSS-INTERVIEW SYNTHESIS TASK
|
| 235 |
-
|
| 236 |
SAMPLE: {len(valid_results)} {interviewee_type} transcripts
|
| 237 |
FOCUS AREAS: {interviewee_context.get('focus', 'general patterns')}
|
| 238 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
COMPLETE TRANSCRIPT DATA:
|
| 240 |
"""
|
| 241 |
-
|
| 242 |
for idx, result in enumerate(valid_results, 1):
|
| 243 |
summary_prompt += f"\n{'='*60}\nTRANSCRIPT {idx}/{len(valid_results)}: {result['file_name']}\n{'='*60}\n"
|
| 244 |
summary_prompt += f"{result['full_text'][:2000]}\n"
|
| 245 |
|
| 246 |
summary_prompt += f"""
|
| 247 |
-
|
| 248 |
ANALYSIS REQUIREMENTS:
|
| 249 |
-
|
| 250 |
1. QUANTIFY EVERYTHING:
|
| 251 |
- Count participants: "X out of {len(valid_results)} participants mentioned..."
|
| 252 |
- Never use vague terms (many/most/some)
|
| 253 |
- Calculate percentages where relevant
|
| 254 |
-
|
| 255 |
-
2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 256 |
- STRONG CONSENSUS (80%+ = {int(len(valid_results)*0.8)}+ transcripts agree)
|
| 257 |
- MAJORITY VIEW (60-79% = {int(len(valid_results)*0.6)}-{int(len(valid_results)*0.79)} transcripts)
|
| 258 |
- SPLIT PERSPECTIVES (40-59% = mixed views)
|
| 259 |
- MINORITY/OUTLIER (<40% but notable)
|
| 260 |
-
|
| 261 |
-
|
| 262 |
- Check for contradictions between transcripts
|
| 263 |
- Note where perspectives diverge and why
|
| 264 |
- Flag any quality issues in individual transcripts
|
| 265 |
-
|
| 266 |
-
|
| 267 |
- Reference specific transcript numbers
|
| 268 |
- Brief supporting details
|
|
|
|
| 269 |
- Distinguish verified facts from interpretation
|
| 270 |
-
|
| 271 |
OUTPUT FORMAT:
|
| 272 |
-
Write 2-3 sentence executive overview, then structure as:
|
| 273 |
-
|
| 274 |
**STRONG CONSENSUS FINDINGS:**
|
| 275 |
-
- [Finding with count and
|
| 276 |
-
|
| 277 |
**MAJORITY FINDINGS:**
|
| 278 |
-
- [Finding with count]
|
| 279 |
-
|
| 280 |
**DIVERGENT PERSPECTIVES:**
|
| 281 |
-
- [Where views split
|
| 282 |
-
|
| 283 |
**NOTABLE OUTLIERS:**
|
| 284 |
-
- [Unique but important points]
|
| 285 |
-
|
| 286 |
**DATA QUALITY NOTES:**
|
| 287 |
- [Any gaps or transcript issues]
|
| 288 |
-
|
|
|
|
| 289 |
Be specific. Use numbers. Cite transcript IDs. Flag weak evidence.
|
| 290 |
"""
|
| 291 |
|
|
@@ -311,13 +459,25 @@ Additional Instructions:
|
|
| 311 |
from llm_robust import generate_emergency_summary
|
| 312 |
summary, summary_data = generate_emergency_summary(interviewee_type)
|
| 313 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 314 |
# Validate summary quality and retry if needed
|
| 315 |
-
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 319 |
|
| 320 |
-
if summary_score < 0.7: # Quality threshold
|
| 321 |
print(f"[Warning] Summary quality issues (score: {summary_score:.2f}): {summary_issues}")
|
| 322 |
print("[Summary] Retrying with stricter validation...")
|
| 323 |
|
|
@@ -349,6 +509,14 @@ MANDATORY CORRECTIONS:
|
|
| 349 |
print("[Summary] Using emergency fallback for retry...")
|
| 350 |
summary, summary_data = generate_emergency_summary(interviewee_type)
|
| 351 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 352 |
# Re-validate
|
| 353 |
summary_score, summary_issues = validate_summary_quality(summary, len(valid_results))
|
| 354 |
|
|
@@ -369,13 +537,16 @@ Please review findings carefully and verify against source data.
|
|
| 369 |
print(f"[Summary] โ Validation passed (score: {summary_score:.2f})")
|
| 370 |
|
| 371 |
# Verify consensus claims against actual data
|
| 372 |
-
|
| 373 |
-
|
| 374 |
-
|
| 375 |
-
|
| 376 |
-
|
|
|
|
|
|
|
|
|
|
| 377 |
else:
|
| 378 |
-
print("[Summary]
|
| 379 |
|
| 380 |
# Generate enhanced reports
|
| 381 |
csv_path = generate_enhanced_csv(csv_rows, interviewee_type)
|
|
@@ -407,7 +578,16 @@ Please review findings carefully and verify against source data.
|
|
| 407 |
"""
|
| 408 |
|
| 409 |
if processing_errors:
|
| 410 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 411 |
|
| 412 |
output_text += "\n\n---\n\n## Individual Transcript Results\n\n"
|
| 413 |
|
|
@@ -417,13 +597,22 @@ Please review findings carefully and verify against source data.
|
|
| 417 |
output_text += result['full_text'] + "\n\n---\n\n"
|
| 418 |
|
| 419 |
progress(1.0, desc="Complete!")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 420 |
return output_text, csv_path, pdf_path, dashboard
|
| 421 |
-
|
| 422 |
except Exception as e:
|
| 423 |
error_msg = f"[Fatal Error] Summary or report generation failed: {str(e)}"
|
| 424 |
print(error_msg)
|
| 425 |
import traceback
|
| 426 |
traceback.print_exc()
|
|
|
|
|
|
|
|
|
|
|
|
|
| 427 |
return error_msg, None, None, None
|
| 428 |
|
| 429 |
def generate_narrative_report_ui(csv_file, summary_text, interviewee_type, report_style):
|
|
@@ -434,22 +623,22 @@ def generate_narrative_report_ui(csv_file, summary_text, interviewee_type, repor
|
|
| 434 |
from narrative_report_generator import generate_narrative_report
|
| 435 |
import tempfile
|
| 436 |
import os
|
| 437 |
-
|
| 438 |
# Check if CSV file exists
|
| 439 |
if csv_file is None:
|
| 440 |
return "Error: No CSV file provided. Please run analysis first.", None, None, None
|
| 441 |
-
|
| 442 |
# Save summary text to temp file if provided
|
| 443 |
summary_path = None
|
| 444 |
if summary_text and summary_text.strip():
|
| 445 |
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
|
| 446 |
f.write(summary_text)
|
| 447 |
summary_path = f.name
|
| 448 |
-
|
| 449 |
# Determine LLM backend
|
| 450 |
llm_backend = "lmstudio" if os.getenv("USE_LMSTUDIO", "False").lower() == "true" else "hf_api"
|
| 451 |
-
|
| 452 |
-
# Generate narrative report
|
| 453 |
pdf_path, word_path, html_path = generate_narrative_report(
|
| 454 |
csv_path=csv_file.name if hasattr(csv_file, 'name') else csv_file,
|
| 455 |
summary_path=summary_path,
|
|
|
|
| 8 |
from llm import query_llm, extract_structured_data
|
| 9 |
from reporting import generate_enhanced_csv, generate_enhanced_pdf
|
| 10 |
from dashboard import generate_comprehensive_dashboard
|
| 11 |
+
from validation import validate_transcript_quality, check_data_completeness
|
| 12 |
+
from quote_extractor import extract_quotes_from_results
|
| 13 |
+
from production_logger import init_session, ProductionLogger, PerformanceMonitor
|
| 14 |
+
|
| 15 |
+
# Optional imports for enhanced validation (may not exist in older deployments)
|
| 16 |
+
try:
|
| 17 |
+
from validation import verify_consensus_claims, validate_summary_quality
|
| 18 |
+
HAS_ENHANCED_VALIDATION = True
|
| 19 |
+
except ImportError:
|
| 20 |
+
HAS_ENHANCED_VALIDATION = False
|
| 21 |
+
print("โ ๏ธ Enhanced validation functions not available - using basic validation only")
|
| 22 |
+
|
| 23 |
+
# Load environment configuration from .env file
|
| 24 |
+
def load_env_file(filepath='.env'):
|
| 25 |
+
"""Manually load environment variables from .env file"""
|
| 26 |
+
if os.path.exists(filepath):
|
| 27 |
+
with open(filepath, 'r') as f:
|
| 28 |
+
for line in f:
|
| 29 |
+
line = line.strip()
|
| 30 |
+
# Skip comments and empty lines
|
| 31 |
+
if line and not line.startswith('#'):
|
| 32 |
+
if '=' in line:
|
| 33 |
+
key, value = line.split('=', 1)
|
| 34 |
+
os.environ[key.strip()] = value.strip()
|
| 35 |
+
print(f"โ
Loaded configuration from {filepath}")
|
| 36 |
+
return True
|
| 37 |
+
return False
|
| 38 |
|
| 39 |
# HuggingFace Spaces Configuration
|
| 40 |
+
# Settings can be configured via Spaces Secrets/Variables
|
| 41 |
+
# Defaults to local model inference (no API calls)
|
| 42 |
+
|
| 43 |
+
# Try to load .env if it exists (for local development)
|
| 44 |
+
if os.path.exists('.env'):
|
| 45 |
+
load_env_file('.env')
|
| 46 |
+
print("โ
Loaded .env file (local development mode)")
|
| 47 |
+
else:
|
| 48 |
+
print("โน๏ธ No .env file found - using HuggingFace Spaces configuration")
|
| 49 |
+
|
| 50 |
+
# Set defaults for HuggingFace Spaces (can be overridden with Spaces Variables)
|
| 51 |
+
os.environ.setdefault("USE_HF_API", "False")
|
| 52 |
+
os.environ.setdefault("USE_LMSTUDIO", "False")
|
| 53 |
+
os.environ.setdefault("DEBUG_MODE", os.getenv("DEBUG_MODE", "False"))
|
| 54 |
+
os.environ.setdefault("LLM_BACKEND", "local")
|
| 55 |
+
os.environ.setdefault("LLM_TIMEOUT", "120")
|
| 56 |
+
os.environ.setdefault("MAX_TOKENS_PER_REQUEST", "1500")
|
| 57 |
+
os.environ.setdefault("LLM_TEMPERATURE", "0.7")
|
| 58 |
+
|
| 59 |
+
print("โ
Configuration loaded for HuggingFace Spaces")
|
| 60 |
+
print(f"๐ TranscriptorAI Enterprise - LLM Backend: {os.getenv('LLM_BACKEND')}")
|
| 61 |
+
print(f"๐ง USE_HF_API: {os.getenv('USE_HF_API')}")
|
| 62 |
+
print(f"๐ง USE_LMSTUDIO: {os.getenv('USE_LMSTUDIO')}")
|
| 63 |
+
print(f"๐ง DEBUG_MODE: {os.getenv('DEBUG_MODE')}")
|
| 64 |
|
| 65 |
def analyze(files, file_type, user_comments, role_hint, debug_mode, interviewee_type, progress=gr.Progress()):
|
| 66 |
"""
|
| 67 |
+
Enhanced analysis pipeline with robust error handling, validation, and production logging
|
| 68 |
"""
|
| 69 |
+
# Initialize production logging session
|
| 70 |
+
session_id = datetime.now().strftime("%Y%m%d_%H%M%S")
|
| 71 |
+
prod_logger = init_session(session_id)
|
| 72 |
+
perf_monitor = PerformanceMonitor(prod_logger)
|
| 73 |
+
|
| 74 |
+
prod_logger.logger.info(f"="*80)
|
| 75 |
+
prod_logger.logger.info(f"NEW ANALYSIS SESSION: {session_id}")
|
| 76 |
+
prod_logger.logger.info(f"Files: {len(files)} | Type: {file_type} | Interviewee: {interviewee_type}")
|
| 77 |
+
prod_logger.logger.info(f"="*80)
|
| 78 |
+
|
| 79 |
os.environ["DEBUG_MODE"] = str(debug_mode)
|
| 80 |
+
|
| 81 |
if not files:
|
| 82 |
+
prod_logger.log_warning("No files uploaded")
|
| 83 |
return "Error: No files uploaded", None, None, None
|
| 84 |
+
|
| 85 |
all_results = []
|
| 86 |
csv_rows = []
|
| 87 |
processing_errors = []
|
| 88 |
+
|
| 89 |
progress(0, desc="Initializing...")
|
| 90 |
print(f"[Start] Processing {len(files)} file(s) as {file_type}")
|
| 91 |
|
|
|
|
| 120 |
|
| 121 |
for i, file in enumerate(files):
|
| 122 |
file_name = os.path.basename(file.name)
|
| 123 |
+
prod_logger.log_transcript_start(file_name, file_type, interviewee_type)
|
| 124 |
+
perf_monitor.start_timer(f"transcript_{i+1}_processing")
|
| 125 |
+
|
| 126 |
try:
|
| 127 |
# Step 1: Extract text
|
| 128 |
progress((current_step / total_steps), desc=f"Extracting {file_name}...")
|
|
|
|
| 161 |
progress(chunk_progress, desc=f"Analyzing {file_name} ({j+1}/{len(chunks)})...")
|
| 162 |
|
| 163 |
result, chunk_data = query_llm(
|
| 164 |
+
chunk,
|
| 165 |
+
user_context,
|
| 166 |
interviewee_type,
|
| 167 |
extract_structured=True
|
| 168 |
)
|
| 169 |
+
|
| 170 |
+
# Ensure result is a string before appending
|
| 171 |
+
if not isinstance(result, str):
|
| 172 |
+
print(f"[Warning] LLM result is not a string (type: {type(result)}), converting...")
|
| 173 |
+
if isinstance(result, dict):
|
| 174 |
+
result = str(result.get('content', str(result)))
|
| 175 |
+
else:
|
| 176 |
+
result = str(result)
|
| 177 |
+
|
| 178 |
+
# Additional safety: Only append non-empty strings
|
| 179 |
+
if result and isinstance(result, str) and len(result.strip()) > 0:
|
| 180 |
+
transcript_result.append(result)
|
| 181 |
+
else:
|
| 182 |
+
print(f"[Warning] Skipping empty/invalid result for chunk {j+1}")
|
| 183 |
+
|
| 184 |
# Merge structured data
|
| 185 |
for key, value in chunk_data.items():
|
| 186 |
if key not in structured_data:
|
|
|
|
| 191 |
structured_data[key].append(value)
|
| 192 |
|
| 193 |
current_step += 1
|
| 194 |
+
|
| 195 |
# Combine and validate results
|
| 196 |
+
# Final safety check: ensure ALL items in transcript_result are strings
|
| 197 |
+
cleaned_results = []
|
| 198 |
+
for idx, item in enumerate(transcript_result):
|
| 199 |
+
if isinstance(item, str):
|
| 200 |
+
cleaned_results.append(item)
|
| 201 |
+
else:
|
| 202 |
+
print(f"[Warning] Removing non-string item at index {idx}: {type(item)}")
|
| 203 |
+
# Try to extract text from dict if possible
|
| 204 |
+
if isinstance(item, dict) and 'content' in item:
|
| 205 |
+
cleaned_results.append(str(item['content']))
|
| 206 |
+
# Otherwise skip it
|
| 207 |
+
|
| 208 |
+
full_text = "\n\n".join(cleaned_results)
|
| 209 |
|
| 210 |
# Quality check
|
| 211 |
quality_score, quality_issues = validate_transcript_quality(
|
|
|
|
| 235 |
"Word Count": len(raw_text.split()),
|
| 236 |
}
|
| 237 |
|
| 238 |
+
# Helper function to safely join structured data (convert dicts to strings if needed)
|
| 239 |
+
def safe_join(items):
|
| 240 |
+
"""Convert all items to strings before joining"""
|
| 241 |
+
str_items = []
|
| 242 |
+
for item in items:
|
| 243 |
+
if isinstance(item, str):
|
| 244 |
+
str_items.append(item)
|
| 245 |
+
elif isinstance(item, dict):
|
| 246 |
+
# Try to extract meaningful text from dict
|
| 247 |
+
# Common patterns: {"name": "X"}, {"condition": "Y", "severity": "Z"}
|
| 248 |
+
if "name" in item:
|
| 249 |
+
str_items.append(str(item["name"]))
|
| 250 |
+
elif "condition" in item:
|
| 251 |
+
# Format as "condition (severity)"
|
| 252 |
+
cond = item["condition"]
|
| 253 |
+
if "severity" in item:
|
| 254 |
+
str_items.append(f"{cond} ({item['severity']})")
|
| 255 |
+
else:
|
| 256 |
+
str_items.append(cond)
|
| 257 |
+
else:
|
| 258 |
+
# Fallback: just stringify the dict
|
| 259 |
+
str_items.append(str(item))
|
| 260 |
+
else:
|
| 261 |
+
str_items.append(str(item))
|
| 262 |
+
return "; ".join(str_items)
|
| 263 |
+
|
| 264 |
# Add interviewee-specific fields
|
| 265 |
if interviewee_type == "HCP":
|
| 266 |
csv_row.update({
|
| 267 |
+
"Diagnoses": safe_join(structured_data.get("diagnoses", [])),
|
| 268 |
+
"Prescriptions": safe_join(structured_data.get("prescriptions", [])),
|
| 269 |
+
"Treatment Strategies": safe_join(structured_data.get("treatment_rationale", [])),
|
| 270 |
+
"Guidelines Mentioned": safe_join(structured_data.get("guidelines_mentioned", []))
|
| 271 |
})
|
| 272 |
elif interviewee_type == "Patient":
|
| 273 |
csv_row.update({
|
| 274 |
+
"Primary Symptoms": safe_join(structured_data.get("symptoms", [])),
|
| 275 |
+
"Main Concerns": safe_join(structured_data.get("concerns", [])),
|
| 276 |
+
"Treatment Response": safe_join(structured_data.get("treatment_response", [])),
|
| 277 |
+
"Side Effects": safe_join(structured_data.get("side_effects", []))
|
| 278 |
})
|
| 279 |
else:
|
| 280 |
csv_row.update({
|
| 281 |
+
"Key Insights": safe_join(structured_data.get("key_insights", [])),
|
| 282 |
+
"Recommendations": safe_join(structured_data.get("recommendations", []))
|
| 283 |
})
|
| 284 |
|
| 285 |
csv_rows.append(csv_row)
|
| 286 |
+
|
| 287 |
+
# Log successful completion
|
| 288 |
+
processing_time = perf_monitor.end_timer(f"transcript_{i+1}_processing")
|
| 289 |
+
prod_logger.log_transcript_complete(file_name, quality_score, len(raw_text.split()), processing_time)
|
| 290 |
+
|
| 291 |
print(f"[File {i+1}] โ Processing complete")
|
| 292 |
+
|
| 293 |
except Exception as e:
|
| 294 |
# Enhanced error tracking with type and traceback
|
| 295 |
import traceback
|
|
|
|
| 300 |
error_msg = f"[{error_type}] {file_name}: {error_details}"
|
| 301 |
print(error_msg)
|
| 302 |
|
| 303 |
+
# Log error
|
| 304 |
+
perf_monitor.end_timer(f"transcript_{i+1}_processing") # End timer even on error
|
| 305 |
+
prod_logger.log_transcript_error(file_name, error_type, error_details[:200])
|
| 306 |
+
|
| 307 |
# Store comprehensive error information
|
| 308 |
processing_errors.append({
|
| 309 |
"transcript_id": f"Transcript {i+1}",
|
|
|
|
| 339 |
try:
|
| 340 |
progress(0.9, desc="Generating summary and reports...")
|
| 341 |
print("[Summary] Analyzing trends across transcripts")
|
| 342 |
+
|
| 343 |
# Combine successful results
|
| 344 |
valid_results = [r for r in all_results if r["quality_score"] > 0]
|
| 345 |
+
|
| 346 |
if not valid_results:
|
| 347 |
return "Error: No transcripts were successfully processed", None, None, None
|
| 348 |
+
|
| 349 |
+
# Extract quotes for storytelling
|
| 350 |
+
print("[Quotes] Extracting impactful quotes from transcripts...")
|
| 351 |
+
with perf_monitor.measure("quote_extraction"):
|
| 352 |
+
quotes_data = extract_quotes_from_results(valid_results, interviewee_type)
|
| 353 |
+
|
| 354 |
+
top_score = quotes_data['top_quotes'][0]['impact_score'] if quotes_data['top_quotes'] else 0
|
| 355 |
+
themes = list(quotes_data['by_theme'].keys())
|
| 356 |
+
prod_logger.log_quote_extraction(len(quotes_data['all_quotes']), top_score, themes)
|
| 357 |
+
|
| 358 |
+
print(f"[Quotes] Extracted {len(quotes_data['all_quotes'])} quotes, top impact score: {top_score:.2f}" if quotes_data['top_quotes'] else "[Quotes] No quotes extracted")
|
| 359 |
|
| 360 |
+
# Build comprehensive summary prompt with quotes
|
| 361 |
summary_prompt = f"""
|
| 362 |
CROSS-INTERVIEW SYNTHESIS TASK
|
| 363 |
+
|
| 364 |
SAMPLE: {len(valid_results)} {interviewee_type} transcripts
|
| 365 |
FOCUS AREAS: {interviewee_context.get('focus', 'general patterns')}
|
| 366 |
+
"""
|
| 367 |
+
|
| 368 |
+
# Add top quotes section for storytelling context
|
| 369 |
+
if quotes_data['top_quotes']:
|
| 370 |
+
summary_prompt += f"""
|
| 371 |
+
|
| 372 |
+
TOP PARTICIPANT QUOTES (use these to bring findings to life):
|
| 373 |
+
"""
|
| 374 |
+
for i, quote in enumerate(quotes_data['top_quotes'][:10], 1):
|
| 375 |
+
summary_prompt += f"\n{i}. [{quote['theme'].upper()}] (from {quote['transcript_id']})\n \"{quote['text']}\"\n"
|
| 376 |
+
|
| 377 |
+
summary_prompt += """
|
| 378 |
+
|
| 379 |
COMPLETE TRANSCRIPT DATA:
|
| 380 |
"""
|
| 381 |
+
|
| 382 |
for idx, result in enumerate(valid_results, 1):
|
| 383 |
summary_prompt += f"\n{'='*60}\nTRANSCRIPT {idx}/{len(valid_results)}: {result['file_name']}\n{'='*60}\n"
|
| 384 |
summary_prompt += f"{result['full_text'][:2000]}\n"
|
| 385 |
|
| 386 |
summary_prompt += f"""
|
| 387 |
+
|
| 388 |
ANALYSIS REQUIREMENTS:
|
| 389 |
+
|
| 390 |
1. QUANTIFY EVERYTHING:
|
| 391 |
- Count participants: "X out of {len(valid_results)} participants mentioned..."
|
| 392 |
- Never use vague terms (many/most/some)
|
| 393 |
- Calculate percentages where relevant
|
| 394 |
+
|
| 395 |
+
2. INTEGRATE PARTICIPANT VOICE:
|
| 396 |
+
- Weave in quotes from the "TOP PARTICIPANT QUOTES" section above
|
| 397 |
+
- Use quotes to bring data to life and prove points
|
| 398 |
+
- Format as: "X out of {len(valid_results)} mentioned [finding]. As one {interviewee_type.lower()} described, '[quote]'"
|
| 399 |
+
- Include 3-5 quotes in your narrative
|
| 400 |
+
|
| 401 |
+
3. IDENTIFY PATTERNS BY CONSENSUS LEVEL:
|
| 402 |
- STRONG CONSENSUS (80%+ = {int(len(valid_results)*0.8)}+ transcripts agree)
|
| 403 |
- MAJORITY VIEW (60-79% = {int(len(valid_results)*0.6)}-{int(len(valid_results)*0.79)} transcripts)
|
| 404 |
- SPLIT PERSPECTIVES (40-59% = mixed views)
|
| 405 |
- MINORITY/OUTLIER (<40% but notable)
|
| 406 |
+
|
| 407 |
+
4. CROSS-VALIDATE:
|
| 408 |
- Check for contradictions between transcripts
|
| 409 |
- Note where perspectives diverge and why
|
| 410 |
- Flag any quality issues in individual transcripts
|
| 411 |
+
|
| 412 |
+
5. CITE EVIDENCE:
|
| 413 |
- Reference specific transcript numbers
|
| 414 |
- Brief supporting details
|
| 415 |
+
- Use participant quotes as proof points
|
| 416 |
- Distinguish verified facts from interpretation
|
| 417 |
+
|
| 418 |
OUTPUT FORMAT:
|
| 419 |
+
Write 2-3 sentence executive overview WITH a compelling quote, then structure as:
|
| 420 |
+
|
| 421 |
**STRONG CONSENSUS FINDINGS:**
|
| 422 |
+
- [Finding with count, supporting quote if available, and business implication]
|
| 423 |
+
|
| 424 |
**MAJORITY FINDINGS:**
|
| 425 |
+
- [Finding with count and quote]
|
| 426 |
+
|
| 427 |
**DIVERGENT PERSPECTIVES:**
|
| 428 |
+
- [Where views split, with quotes showing both sides if possible]
|
| 429 |
+
|
| 430 |
**NOTABLE OUTLIERS:**
|
| 431 |
+
- [Unique but important points, use quote if impactful]
|
| 432 |
+
|
| 433 |
**DATA QUALITY NOTES:**
|
| 434 |
- [Any gaps or transcript issues]
|
| 435 |
+
|
| 436 |
+
CRITICAL: Integrate quotes naturally. Use participant voice to make findings memorable and credible.
|
| 437 |
Be specific. Use numbers. Cite transcript IDs. Flag weak evidence.
|
| 438 |
"""
|
| 439 |
|
|
|
|
| 459 |
from llm_robust import generate_emergency_summary
|
| 460 |
summary, summary_data = generate_emergency_summary(interviewee_type)
|
| 461 |
|
| 462 |
+
# Ensure summary is a string (defensive check for LLM response format issues)
|
| 463 |
+
if not isinstance(summary, str):
|
| 464 |
+
print(f"[Warning] Summary is not a string (type: {type(summary)}), converting...")
|
| 465 |
+
if isinstance(summary, dict):
|
| 466 |
+
summary = str(summary.get('content', str(summary)))
|
| 467 |
+
else:
|
| 468 |
+
summary = str(summary)
|
| 469 |
+
|
| 470 |
# Validate summary quality and retry if needed
|
| 471 |
+
if HAS_ENHANCED_VALIDATION:
|
| 472 |
+
summary_score, summary_issues = validate_summary_quality(
|
| 473 |
+
summary,
|
| 474 |
+
len(valid_results)
|
| 475 |
+
)
|
| 476 |
+
else:
|
| 477 |
+
summary_score = 1.0
|
| 478 |
+
summary_issues = []
|
| 479 |
|
| 480 |
+
if HAS_ENHANCED_VALIDATION and summary_score < 0.7: # Quality threshold
|
| 481 |
print(f"[Warning] Summary quality issues (score: {summary_score:.2f}): {summary_issues}")
|
| 482 |
print("[Summary] Retrying with stricter validation...")
|
| 483 |
|
|
|
|
| 509 |
print("[Summary] Using emergency fallback for retry...")
|
| 510 |
summary, summary_data = generate_emergency_summary(interviewee_type)
|
| 511 |
|
| 512 |
+
# Ensure summary is a string after retry
|
| 513 |
+
if not isinstance(summary, str):
|
| 514 |
+
print(f"[Warning] Retry summary is not a string (type: {type(summary)}), converting...")
|
| 515 |
+
if isinstance(summary, dict):
|
| 516 |
+
summary = str(summary.get('content', str(summary)))
|
| 517 |
+
else:
|
| 518 |
+
summary = str(summary)
|
| 519 |
+
|
| 520 |
# Re-validate
|
| 521 |
summary_score, summary_issues = validate_summary_quality(summary, len(valid_results))
|
| 522 |
|
|
|
|
| 537 |
print(f"[Summary] โ Validation passed (score: {summary_score:.2f})")
|
| 538 |
|
| 539 |
# Verify consensus claims against actual data
|
| 540 |
+
if HAS_ENHANCED_VALIDATION:
|
| 541 |
+
consensus_warnings = verify_consensus_claims(summary, valid_results)
|
| 542 |
+
if consensus_warnings:
|
| 543 |
+
print(f"[Warning] Consensus verification issues: {len(consensus_warnings)} found")
|
| 544 |
+
consensus_note = "\n\n[CONSENSUS VERIFICATION NOTES]:\n" + "\n".join(f"- {w}" for w in consensus_warnings) + "\n\n"
|
| 545 |
+
summary = summary + consensus_note
|
| 546 |
+
else:
|
| 547 |
+
print("[Summary] โ Consensus claims verified")
|
| 548 |
else:
|
| 549 |
+
print("[Summary] โ ๏ธ Consensus verification skipped (enhanced validation not available)")
|
| 550 |
|
| 551 |
# Generate enhanced reports
|
| 552 |
csv_path = generate_enhanced_csv(csv_rows, interviewee_type)
|
|
|
|
| 578 |
"""
|
| 579 |
|
| 580 |
if processing_errors:
|
| 581 |
+
# Convert error dicts to readable strings
|
| 582 |
+
error_messages = []
|
| 583 |
+
for err in processing_errors:
|
| 584 |
+
if isinstance(err, dict):
|
| 585 |
+
# Format: "Transcript X (filename.docx): ErrorType - message"
|
| 586 |
+
error_msg = f"{err.get('transcript_id', 'Unknown')} ({err.get('file_name', 'unknown')}): {err.get('error_type', 'Error')} - {err.get('error_message', 'Unknown error')}"
|
| 587 |
+
error_messages.append(error_msg)
|
| 588 |
+
else:
|
| 589 |
+
error_messages.append(str(err))
|
| 590 |
+
output_text += f"\n## Processing Errors\n" + "\n".join(f"- {msg}" for msg in error_messages)
|
| 591 |
|
| 592 |
output_text += "\n\n---\n\n## Individual Transcript Results\n\n"
|
| 593 |
|
|
|
|
| 597 |
output_text += result['full_text'] + "\n\n---\n\n"
|
| 598 |
|
| 599 |
progress(1.0, desc="Complete!")
|
| 600 |
+
|
| 601 |
+
# Finalize production logging session
|
| 602 |
+
session_summary = prod_logger.finalize_session()
|
| 603 |
+
prod_logger.logger.info(f"Session logs saved to: logs/session_{session_id}.*")
|
| 604 |
+
|
| 605 |
return output_text, csv_path, pdf_path, dashboard
|
| 606 |
+
|
| 607 |
except Exception as e:
|
| 608 |
error_msg = f"[Fatal Error] Summary or report generation failed: {str(e)}"
|
| 609 |
print(error_msg)
|
| 610 |
import traceback
|
| 611 |
traceback.print_exc()
|
| 612 |
+
|
| 613 |
+
prod_logger.log_transcript_error("SUMMARY_GENERATION", type(e).__name__, str(e))
|
| 614 |
+
prod_logger.finalize_session()
|
| 615 |
+
|
| 616 |
return error_msg, None, None, None
|
| 617 |
|
| 618 |
def generate_narrative_report_ui(csv_file, summary_text, interviewee_type, report_style):
|
|
|
|
| 623 |
from narrative_report_generator import generate_narrative_report
|
| 624 |
import tempfile
|
| 625 |
import os
|
| 626 |
+
|
| 627 |
# Check if CSV file exists
|
| 628 |
if csv_file is None:
|
| 629 |
return "Error: No CSV file provided. Please run analysis first.", None, None, None
|
| 630 |
+
|
| 631 |
# Save summary text to temp file if provided
|
| 632 |
summary_path = None
|
| 633 |
if summary_text and summary_text.strip():
|
| 634 |
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
|
| 635 |
f.write(summary_text)
|
| 636 |
summary_path = f.name
|
| 637 |
+
|
| 638 |
# Determine LLM backend
|
| 639 |
llm_backend = "lmstudio" if os.getenv("USE_LMSTUDIO", "False").lower() == "true" else "hf_api"
|
| 640 |
+
|
| 641 |
+
# Generate narrative report (quotes will be extracted inside the function)
|
| 642 |
pdf_path, word_path, html_path = generate_narrative_report(
|
| 643 |
csv_path=csv_file.name if hasattr(csv_file, 'name') else csv_file,
|
| 644 |
summary_path=summary_path,
|
llm.py
CHANGED
|
@@ -362,39 +362,70 @@ def query_llm_lmstudio(prompt: str, max_tokens: int = 1500) -> str:
|
|
| 362 |
|
| 363 |
|
| 364 |
def query_llm_local(prompt: str, max_tokens: int = 1500) -> str:
|
| 365 |
-
"""
|
|
|
|
|
|
|
|
|
|
| 366 |
try:
|
| 367 |
-
from transformers import
|
| 368 |
import torch
|
| 369 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 370 |
if not hasattr(query_llm_local, 'model'):
|
| 371 |
-
|
| 372 |
-
query_llm_local.tokenizer = AutoTokenizer.from_pretrained(
|
| 373 |
-
|
| 374 |
-
|
| 375 |
-
torch_dtype=torch.float16,
|
| 376 |
-
device_map="auto"
|
| 377 |
)
|
| 378 |
-
|
| 379 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 380 |
inputs = query_llm_local.tokenizer(
|
| 381 |
-
prompt,
|
| 382 |
-
return_tensors="pt",
|
| 383 |
-
truncation=True,
|
| 384 |
-
max_length=
|
| 385 |
-
)
|
| 386 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 387 |
outputs = query_llm_local.model.generate(
|
| 388 |
**inputs,
|
| 389 |
max_new_tokens=max_tokens,
|
| 390 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 391 |
)
|
| 392 |
-
|
| 393 |
-
|
| 394 |
return response.strip()
|
| 395 |
-
|
| 396 |
except Exception as e:
|
| 397 |
-
|
|
|
|
|
|
|
| 398 |
return f"[Error] Local model failed: {e}"
|
| 399 |
|
| 400 |
|
|
|
|
| 362 |
|
| 363 |
|
| 364 |
def query_llm_local(prompt: str, max_tokens: int = 1500) -> str:
|
| 365 |
+
"""
|
| 366 |
+
Local model inference optimized for HuggingFace Spaces
|
| 367 |
+
Uses Phi-3-mini for better instruction following and JSON generation
|
| 368 |
+
"""
|
| 369 |
try:
|
| 370 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 371 |
import torch
|
| 372 |
+
|
| 373 |
+
# Get model name from environment (can be set in Spaces Variables)
|
| 374 |
+
model_name = os.getenv("LOCAL_MODEL", "microsoft/Phi-3-mini-4k-instruct")
|
| 375 |
+
|
| 376 |
+
# Load model once and cache it
|
| 377 |
if not hasattr(query_llm_local, 'model'):
|
| 378 |
+
print(f"[Local Model] Loading {model_name}...")
|
| 379 |
+
query_llm_local.tokenizer = AutoTokenizer.from_pretrained(
|
| 380 |
+
model_name,
|
| 381 |
+
trust_remote_code=True
|
|
|
|
|
|
|
| 382 |
)
|
| 383 |
+
query_llm_local.model = AutoModelForCausalLM.from_pretrained(
|
| 384 |
+
model_name,
|
| 385 |
+
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
|
| 386 |
+
device_map="auto",
|
| 387 |
+
trust_remote_code=True
|
| 388 |
+
)
|
| 389 |
+
print(f"[Local Model] โ
Model loaded on {query_llm_local.model.device}")
|
| 390 |
+
|
| 391 |
+
# Get temperature from environment
|
| 392 |
+
temperature = float(os.getenv("LLM_TEMPERATURE", "0.7"))
|
| 393 |
+
|
| 394 |
+
# Tokenize with proper truncation for 4k context
|
| 395 |
inputs = query_llm_local.tokenizer(
|
| 396 |
+
prompt,
|
| 397 |
+
return_tensors="pt",
|
| 398 |
+
truncation=True,
|
| 399 |
+
max_length=3500 # Leave room for response
|
| 400 |
+
)
|
| 401 |
+
|
| 402 |
+
# Move to device
|
| 403 |
+
device = query_llm_local.model.device
|
| 404 |
+
inputs = {k: v.to(device) for k, v in inputs.items()}
|
| 405 |
+
|
| 406 |
+
# Generate with proper parameters
|
| 407 |
+
print(f"[Local Model] Generating ({max_tokens} max tokens, temp={temperature})...")
|
| 408 |
outputs = query_llm_local.model.generate(
|
| 409 |
**inputs,
|
| 410 |
max_new_tokens=max_tokens,
|
| 411 |
+
temperature=temperature,
|
| 412 |
+
do_sample=temperature > 0,
|
| 413 |
+
pad_token_id=query_llm_local.tokenizer.eos_token_id
|
| 414 |
+
)
|
| 415 |
+
|
| 416 |
+
# Decode only the new tokens (not the prompt)
|
| 417 |
+
response = query_llm_local.tokenizer.decode(
|
| 418 |
+
outputs[0][inputs['input_ids'].shape[1]:],
|
| 419 |
+
skip_special_tokens=True
|
| 420 |
)
|
| 421 |
+
|
| 422 |
+
print(f"[Local Model] โ
Generated {len(response)} characters")
|
| 423 |
return response.strip()
|
| 424 |
+
|
| 425 |
except Exception as e:
|
| 426 |
+
import traceback
|
| 427 |
+
error_details = traceback.format_exc()
|
| 428 |
+
log(f"Local model error:\n{error_details}")
|
| 429 |
return f"[Error] Local model failed: {e}"
|
| 430 |
|
| 431 |
|
requirements.txt
CHANGED
|
@@ -1,16 +1,50 @@
|
|
| 1 |
-
# TranscriptorAI -
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
gradio>=4.0.0
|
|
|
|
|
|
|
| 3 |
huggingface_hub>=0.19.0
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
#
|
| 14 |
-
#
|
| 15 |
-
#
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# TranscriptorAI - Enterprise Market Research Edition
|
| 2 |
+
# Updated: October 20, 2025
|
| 3 |
+
# Install via Windows PowerShell: pip install -r requirements.txt
|
| 4 |
+
|
| 5 |
+
# ============================================================================
|
| 6 |
+
# CRITICAL DEPENDENCIES (Required for core functionality)
|
| 7 |
+
# ============================================================================
|
| 8 |
+
|
| 9 |
+
# Web UI Framework
|
| 10 |
gradio>=4.0.0
|
| 11 |
+
|
| 12 |
+
# HuggingFace API (CRITICAL - without this, LLM calls fail and Quality Score = 0.00)
|
| 13 |
huggingface_hub>=0.19.0
|
| 14 |
+
|
| 15 |
+
# Document Processing
|
| 16 |
+
python-docx>=1.0.0 # For DOCX file extraction
|
| 17 |
+
pdfplumber>=0.10.0 # For PDF file extraction
|
| 18 |
+
|
| 19 |
+
# Data Processing & Analysis
|
| 20 |
+
pandas>=2.0.0 # CSV handling and data manipulation
|
| 21 |
+
numpy>=1.24.0 # Numerical operations (required by pandas)
|
| 22 |
+
|
| 23 |
+
# Visualization & Reporting
|
| 24 |
+
matplotlib>=3.7.0 # Charts and graphs for dashboard
|
| 25 |
+
reportlab>=4.0.0 # PDF report generation
|
| 26 |
+
|
| 27 |
+
# NLP & Text Processing
|
| 28 |
+
tiktoken>=0.5.0 # Token counting for LLM context management
|
| 29 |
+
nltk>=3.8.0 # Natural language processing utilities
|
| 30 |
+
scikit-learn>=1.3.0 # Text vectorization and similarity
|
| 31 |
+
|
| 32 |
+
# ============================================================================
|
| 33 |
+
# STANDARD LIBRARY DEPENDENCIES (Usually pre-installed, but listed for clarity)
|
| 34 |
+
# ============================================================================
|
| 35 |
+
requests>=2.31.0 # HTTP requests for API calls
|
| 36 |
+
python-dateutil>=2.8.0 # Date/time utilities
|
| 37 |
+
|
| 38 |
+
# ============================================================================
|
| 39 |
+
# OPTIONAL: For Enhanced Error Handling
|
| 40 |
+
# ============================================================================
|
| 41 |
+
python-dotenv>=1.0.0 # .env file loading (optional - we have manual loader)
|
| 42 |
+
|
| 43 |
+
# ============================================================================
|
| 44 |
+
# LOCAL MODEL INFERENCE (For HuggingFace Spaces deployment)
|
| 45 |
+
# ============================================================================
|
| 46 |
+
transformers>=4.36.0 # For local model loading (Phi-3, etc.)
|
| 47 |
+
torch>=2.1.0 # PyTorch for model inference
|
| 48 |
+
accelerate>=0.25.0 # For device_map="auto" and efficient loading
|
| 49 |
+
sentencepiece>=0.1.99 # Tokenizer support for some models
|
| 50 |
+
protobuf>=3.20.0 # Required by some tokenizers
|
test_local_model.py
ADDED
|
@@ -0,0 +1,138 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Test script for local model inference
|
| 3 |
+
Run this to verify your setup before deploying to HuggingFace Spaces
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import os
|
| 7 |
+
import sys
|
| 8 |
+
|
| 9 |
+
# Set environment for local model
|
| 10 |
+
os.environ["USE_HF_API"] = "False"
|
| 11 |
+
os.environ["USE_LMSTUDIO"] = "False"
|
| 12 |
+
os.environ["DEBUG_MODE"] = "True"
|
| 13 |
+
os.environ["LLM_BACKEND"] = "local"
|
| 14 |
+
os.environ["LLM_TEMPERATURE"] = "0.7"
|
| 15 |
+
|
| 16 |
+
print("="*80)
|
| 17 |
+
print("๐งช Testing Local Model Inference")
|
| 18 |
+
print("="*80)
|
| 19 |
+
|
| 20 |
+
# Test imports
|
| 21 |
+
print("\n1๏ธโฃ Testing imports...")
|
| 22 |
+
try:
|
| 23 |
+
import torch
|
| 24 |
+
print(f" โ
PyTorch {torch.__version__}")
|
| 25 |
+
print(f" ๐ง CUDA available: {torch.cuda.is_available()}")
|
| 26 |
+
if torch.cuda.is_available():
|
| 27 |
+
print(f" ๐ฎ GPU: {torch.cuda.get_device_name(0)}")
|
| 28 |
+
except ImportError as e:
|
| 29 |
+
print(f" โ PyTorch not installed: {e}")
|
| 30 |
+
print(" ๐ฆ Install: pip install torch")
|
| 31 |
+
sys.exit(1)
|
| 32 |
+
|
| 33 |
+
try:
|
| 34 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 35 |
+
print(f" โ
Transformers installed")
|
| 36 |
+
except ImportError as e:
|
| 37 |
+
print(f" โ Transformers not installed: {e}")
|
| 38 |
+
print(" ๐ฆ Install: pip install transformers accelerate")
|
| 39 |
+
sys.exit(1)
|
| 40 |
+
|
| 41 |
+
# Test LLM function
|
| 42 |
+
print("\n2๏ธโฃ Testing LLM function...")
|
| 43 |
+
try:
|
| 44 |
+
from llm import query_llm
|
| 45 |
+
print(" โ
LLM module imported")
|
| 46 |
+
except ImportError as e:
|
| 47 |
+
print(f" โ Failed to import llm module: {e}")
|
| 48 |
+
sys.exit(1)
|
| 49 |
+
|
| 50 |
+
# Test simple query
|
| 51 |
+
print("\n3๏ธโฃ Testing simple query (this will download the model on first run)...")
|
| 52 |
+
print(" โณ This may take 2-5 minutes for first-time model download...\n")
|
| 53 |
+
|
| 54 |
+
test_prompt = """You are a medical transcript analyzer.
|
| 55 |
+
|
| 56 |
+
Analyze this brief interview segment:
|
| 57 |
+
|
| 58 |
+
Interviewer: How do you treat moderate acne?
|
| 59 |
+
Doctor: I typically start with topical retinoids and benzoyl peroxide. For more severe cases, I prescribe oral antibiotics like doxycycline 100mg daily.
|
| 60 |
+
|
| 61 |
+
Provide a brief summary and extract structured data in JSON format:
|
| 62 |
+
{
|
| 63 |
+
"diagnoses": ["list of conditions mentioned"],
|
| 64 |
+
"prescriptions": ["list of medications with dosages"],
|
| 65 |
+
"treatment_rationale": ["list of treatment approaches"]
|
| 66 |
+
}
|
| 67 |
+
"""
|
| 68 |
+
|
| 69 |
+
try:
|
| 70 |
+
response, structured_data = query_llm(
|
| 71 |
+
chunk=test_prompt,
|
| 72 |
+
user_context="Extract medical information from this dermatology interview",
|
| 73 |
+
interviewee_type="HCP",
|
| 74 |
+
extract_structured=True,
|
| 75 |
+
timeout=180
|
| 76 |
+
)
|
| 77 |
+
|
| 78 |
+
print("\n" + "="*80)
|
| 79 |
+
print("๐ RESULTS")
|
| 80 |
+
print("="*80)
|
| 81 |
+
|
| 82 |
+
print(f"\n๐ Response Text ({len(response)} chars):")
|
| 83 |
+
print("-" * 80)
|
| 84 |
+
print(response)
|
| 85 |
+
|
| 86 |
+
print(f"\n๐ Structured Data ({len(structured_data)} fields):")
|
| 87 |
+
print("-" * 80)
|
| 88 |
+
import json
|
| 89 |
+
print(json.dumps(structured_data, indent=2))
|
| 90 |
+
|
| 91 |
+
# Validate results
|
| 92 |
+
print("\n" + "="*80)
|
| 93 |
+
print("โ
VALIDATION")
|
| 94 |
+
print("="*80)
|
| 95 |
+
|
| 96 |
+
if len(response) < 50:
|
| 97 |
+
print("โ ๏ธ Warning: Response is very short")
|
| 98 |
+
else:
|
| 99 |
+
print(f"โ
Response length OK ({len(response)} chars)")
|
| 100 |
+
|
| 101 |
+
if not structured_data:
|
| 102 |
+
print("โ No structured data extracted - check JSON parsing!")
|
| 103 |
+
elif len(structured_data) == 0:
|
| 104 |
+
print("โ ๏ธ Structured data is empty")
|
| 105 |
+
else:
|
| 106 |
+
print(f"โ
Structured data extracted ({len(structured_data)} fields)")
|
| 107 |
+
for key, values in structured_data.items():
|
| 108 |
+
if values:
|
| 109 |
+
print(f" โข {key}: {len(values)} items")
|
| 110 |
+
|
| 111 |
+
if "[Error]" in response:
|
| 112 |
+
print("โ Response contains error message!")
|
| 113 |
+
else:
|
| 114 |
+
print("โ
No error messages in response")
|
| 115 |
+
|
| 116 |
+
print("\n" + "="*80)
|
| 117 |
+
print("๐ TEST COMPLETE!")
|
| 118 |
+
print("="*80)
|
| 119 |
+
print("\nYour system is ready for HuggingFace Spaces deployment.")
|
| 120 |
+
print("\n๐ See HUGGINGFACE_SPACES_SETUP.md for deployment instructions.")
|
| 121 |
+
|
| 122 |
+
except Exception as e:
|
| 123 |
+
print("\n" + "="*80)
|
| 124 |
+
print("โ TEST FAILED")
|
| 125 |
+
print("="*80)
|
| 126 |
+
print(f"\nError: {e}")
|
| 127 |
+
|
| 128 |
+
import traceback
|
| 129 |
+
print("\nFull traceback:")
|
| 130 |
+
print(traceback.format_exc())
|
| 131 |
+
|
| 132 |
+
print("\n๐ง Troubleshooting:")
|
| 133 |
+
print("1. Make sure GPU is available (or set device_map='cpu')")
|
| 134 |
+
print("2. Check if you have enough RAM/VRAM (~8GB needed)")
|
| 135 |
+
print("3. Try a smaller model: LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0")
|
| 136 |
+
print("4. Check internet connection for model download")
|
| 137 |
+
|
| 138 |
+
sys.exit(1)
|