# Enterprise Deployment Guide

**TranscriptorAI v3.0 - Market Research Edition**
**Updated:** October 20, 2025

---

## Pre-Deployment Checklist

### Required Changes (Completed ✅)

- [x] **Token Limits Increased**
  - From: 100 tokens → To: 1500-2500 tokens
  - Files: `app.py`, `llm.py`, `story_writer.py`
  - Impact: Enables comprehensive market research narratives

- [x] **Production Logging Implemented**
  - New file: `production_logger.py`
  - Integrated into: `app.py`
  - Features: Session tracking, performance metrics, error logging, export to JSON/TXT

- [x] **Dependencies Documented**
  - File: `requirements.txt`
  - Key requirement: `python-docx>=1.0.0` for DOCX support

### Installation Steps

#### 1. Install Dependencies

```bash
cd /home/john/TranscriptorEnhanced

# Install all required packages
pip3 install -r requirements.txt

# Or install individually:
pip3 install gradio>=4.0.0
pip3 install huggingface_hub>=0.19.0
pip3 install python-docx>=1.0.0
pip3 install pdfplumber>=0.10.0
pip3 install pandas>=2.0.0
pip3 install matplotlib>=3.7.0
pip3 install reportlab>=4.0.0
pip3 install tiktoken>=0.5.0
pip3 install nltk>=3.8.0
pip3 install scikit-learn>=1.3.0
```

#### 2. Set Environment Variables

**Required:**
```bash
export HUGGINGFACE_TOKEN="your_hf_token_here"
```

**Optional (for LM Studio):**
```bash
export USE_LMSTUDIO=True
export LM_STUDIO_URL="http://localhost:1234"
```

#### 3. Create Logs Directory

```bash
mkdir -p /home/john/TranscriptorEnhanced/logs
chmod 755 /home/john/TranscriptorEnhanced/logs
```

#### 4. Test Installation

```bash
# Test quote extraction
python3 test_quotes_simple.py

# Should output:
# ✓ Quote extraction working
# ✓ 39 quotes extracted from 2 transcripts
```

---

## Production Configuration

### Current Settings (Enterprise-Ready)

| Setting | Value | Purpose |
|---------|-------|---------|
| LLM_BACKEND | `hf_api` | HuggingFace Inference API |
| LLM_TIMEOUT | `60s` | Increased for longer generation |
| MAX_TOKENS_PER_REQUEST | `1500` | Enterprise narrative length |
| Temperature (Analysis) | `0.5` | Balanced creativity/accuracy |
| Temperature (Narrative) | `0.7` | More creative storytelling |
| Max Tokens (LM Studio) | `2500` | Full-length reports |
| Max Tokens (HF API) | `1500` | API limits |

### Model Selection

**Current Models:**
- **Analysis:** `microsoft/Phi-3-mini-4k-instruct` (HF API)
- **Narrative:** `mistralai/Mixtral-8x7B-Instruct-v0.1` (HF API)

**⚠️ Known Limitation:** Phi-3-mini has only 4K context window. For transcripts >3000 words, consider:
- Switching to Mixtral-8x7B for analysis (8K context)
- Using LM Studio with larger local models
- Implementing better chunking strategy

---

## Monitoring & Logging

### Log Files Generated

Each analysis session creates:

1. **Session Log:** `logs/session_YYYYMMDD_HHMMSS.log`
   - Detailed timestamped events
   - All processing steps
   - Warnings and errors

2. **JSON Summary:** `logs/summary_YYYYMMDD_HHMMSS.json`
   - Structured metrics
   - Machine-readable
   - For integration with monitoring tools

3. **Text Summary:** `logs/summary_YYYYMMDD_HHMMSS.txt`
   - Human-readable summary
   - Success rates
   - Error details

### Metrics Tracked

**Per Session:**
- Transcripts processed / failed
- Success rate (%)
- Average processing time
- Quotes extracted
- Total session duration
- Error types and frequencies

**Per Transcript:**
- File name and type
- Quality score (0-1)
- Word count
- Processing time (seconds)
- Error details (if failed)

### Example Log Output

```
2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Session started: 20251020_153045
2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Processing started: HCP_Oncologist.txt | Type: HCP | Format: TXT
2025-10-20 15:31:12 | INFO | TranscriptorAI_20251020_153045 | Processing complete: HCP_Oncologist.txt | Quality: 0.95 | Words: 1847 | Time: 27.3s
2025-10-20 15:31:15 | INFO | TranscriptorAI_20251020_153045 | Quote extraction complete: 21 quotes | Top score: 1.00 | Themes: patient_management, prescribing, barriers, safety, diagnosis
2025-10-20 15:31:45 | INFO | TranscriptorAI_20251020_153045 | SESSION COMPLETE | Duration: 60.2s | Processed: 3 | Failed: 0 | Success Rate: 100.0%
```

---

## Performance Benchmarks

Based on testing with sample data:

| Operation | Time | Notes |
|-----------|------|-------|
| Single transcript processing | 25-35s | Depends on length |
| Quote extraction | 2-5s | Per transcript |
| Cross-transcript summary | 30-60s | For 3-10 transcripts |
| **Total for 3 transcripts** | **~2-3 minutes** | End-to-end |

**Bottlenecks:**
1. HuggingFace API latency (network dependent)
2. LLM generation time (model dependent)
3. Quote extraction (scales linearly)

**Optimizations:**
- Use LM Studio for faster local processing (if GPU available)
- Process transcripts in parallel (not yet implemented)
- Cache common analyses (not yet implemented)

---

## Error Handling

### Automatic Recovery

The system includes:
- **Retry logic:** 3 attempts with exponential backoff
- **Fallback:** HF API ↔ LM Studio switching
- **Graceful degradation:** Continue processing other transcripts if one fails
- **Emergency summaries:** Generated if LLM fails

### Common Errors & Solutions

**Error:** `ModuleNotFoundError: No module named 'docx'`
**Solution:** Install python-docx: `pip3 install python-docx`

**Error:** `HF API timeout`
**Solution:** Increase timeout in `app.py` line 25 or use LM Studio

**Error:** `No quotes extracted`
**Solution:** Check transcript formatting (needs speaker labels or quotation marks)

**Error:** `Token limit exceeded`
**Solution:** Already fixed - now using 1500-2500 tokens

---

## Security Considerations

### API Keys

- Store HuggingFace token in environment variables (NOT in code)
- Use secrets management for production (AWS Secrets Manager, HashiCorp Vault)
- Rotate tokens regularly

### Data Privacy

- Transcript data is **not** sent to external services except HF API for LLM calls
- Logs contain file names but **not** transcript content
- Consider HIPAA compliance if processing patient interviews
- Implement data retention policies for logs

### Access Control

- Restrict access to `/logs` directory
- Implement user authentication for Gradio UI (not currently included)
- Use HTTPS in production deployments

---

## Scaling Recommendations

### For 10-50 Transcripts/Day

**Current setup is sufficient**
- Single server deployment
- HuggingFace API with rate limiting
- Local log storage

### For 50-200 Transcripts/Day

**Recommended upgrades:**
- Deploy with multiple workers (Gunicorn)
- Implement Redis queue for job management
- Use dedicated LM Studio instance on GPU server
- Centralized logging (ELK stack, Datadog)

### For 200+ Transcripts/Day

**Enterprise infrastructure:**
- Kubernetes deployment with auto-scaling
- Separate microservices (extraction, analysis, reporting)
- Dedicated GPU cluster for LLM calls
- Cloud object storage (S3) for transcripts/reports
- Real-time monitoring dashboard

---

## Deployment Checklist

### Before Go-Live

- [ ] All dependencies installed (`pip3 install -r requirements.txt`)
- [ ] HuggingFace token configured
- [ ] Logs directory created with proper permissions
- [ ] Test with 3-5 real client transcripts
- [ ] Review generated reports for quality
- [ ] Verify quote extraction working (check console output)
- [ ] Set up log monitoring/alerts
- [ ] Document any client-specific customizations

### Day 1 Production

- [ ] Start with 1-2 small client projects
- [ ] Monitor logs actively (`tail -f logs/session_*.log`)
- [ ] Verify session summaries being generated
- [ ] Track processing times vs. benchmarks
- [ ] Gather client feedback on report quality

### Week 1 Production

- [ ] Review all session logs
- [ ] Calculate average success rate (target: >95%)
- [ ] Identify common errors
- [ ] Optimize based on bottlenecks
- [ ] Update documentation with learnings

---

## Support & Maintenance

### Daily Monitoring

Check these metrics daily:
- Success rate (should be >95%)
- Average processing time (should be <3 minutes for 3 transcripts)
- Error frequency (should be <5%)
- Quote extraction quality (top scores should be >0.75)

### Weekly Maintenance

- Review session summary logs
- Clean up old logs (keep last 30 days)
- Update dependencies if security patches available
- Review client feedback

### Monthly Review

- Analyze performance trends
- Plan optimization improvements
- Update models if better ones available
- Review and update documentation

---

## Troubleshooting

### Low Success Rate (<90%)

**Possible Causes:**
- HuggingFace API rate limiting
- Network connectivity issues
- Malformed transcript files

**Actions:**
1. Check `logs/` for error patterns
2. Verify HF token is valid
3. Test with sample data
4. Consider switching to LM Studio

### Slow Processing (>5 minutes for 3 transcripts)

**Possible Causes:**
- Network latency to HF API
- Large transcript files
- Token limits causing retries

**Actions:**
1. Check network latency: `ping api.huggingface.co`
2. Review performance logs for bottlenecks
3. Consider local LM Studio deployment
4. Implement caching (future enhancement)

### Poor Quote Quality (scores <0.50)

**Possible Causes:**
- Transcripts lack specific details
- No quotation marks or speaker labels
- Very technical/clinical language

**Actions:**
1. Run `test_quotes_simple.py` with problematic transcript
2. Adjust scoring weights in `quote_extractor.py`
3. Add custom patterns for your transcript format
4. Accept that some transcripts naturally have fewer good quotes

---

## Future Enhancements

**High Priority (Next 3 Months):**
1. Upgrade to larger context model (Mixtral-8x7B for all operations)
2. Parallel transcript processing
3. User authentication for Gradio UI
4. Real-time monitoring dashboard

**Medium Priority (3-6 Months):**
5. Caching layer for common analyses
6. Batch processing API
7. Client-specific customization templates
8. Enhanced error recovery

**Low Priority (6-12 Months):**
9. Multi-language support
10. Audio timestamp integration
11. Interactive HTML reports
12. A/B testing framework

---

## Contact & Support

**Documentation:**
- Technical: `MARKET_RESEARCH_ENHANCEMENTS.md`
- User Guide: `STORYTELLING_QUICK_START.md`
- This Guide: `ENTERPRISE_DEPLOYMENT_GUIDE.md`

**Key Files:**
- Logging: `production_logger.py`
- Main App: `app.py`
- Quote Extraction: `quote_extractor.py`
- Narrative Generation: `story_writer.py`

**Logs Location:** `/home/john/TranscriptorEnhanced/logs/`

---

## Summary

✅ **Token Limits:** Increased to 1500-2500 (enterprise-ready)
✅ **Logging:** Full production monitoring implemented
✅ **Dependencies:** Documented in requirements.txt

⚠️ **Still Todo (requires production environment):**
- Install python-docx (needs pip in environment)
- Test with 20+ real transcripts
- Set up centralized log monitoring
- Implement user authentication

**Status:** Ready for controlled production pilot with close monitoring

---

**Last Updated:** October 20, 2025
**Version:** 3.0-Enterprise