TranscriptWriting / ENTERPRISE_DEPLOYMENT_GUIDE.md
jmisak's picture
Upload 57 files
52d0298 verified
# Enterprise Deployment Guide
**TranscriptorAI v3.0 - Market Research Edition**
**Updated:** October 20, 2025
---
## Pre-Deployment Checklist
### Required Changes (Completed βœ…)
- [x] **Token Limits Increased**
- From: 100 tokens β†’ To: 1500-2500 tokens
- Files: `app.py`, `llm.py`, `story_writer.py`
- Impact: Enables comprehensive market research narratives
- [x] **Production Logging Implemented**
- New file: `production_logger.py`
- Integrated into: `app.py`
- Features: Session tracking, performance metrics, error logging, export to JSON/TXT
- [x] **Dependencies Documented**
- File: `requirements.txt`
- Key requirement: `python-docx>=1.0.0` for DOCX support
### Installation Steps
#### 1. Install Dependencies
```bash
cd /home/john/TranscriptorEnhanced
# Install all required packages
pip3 install -r requirements.txt
# Or install individually:
pip3 install gradio>=4.0.0
pip3 install huggingface_hub>=0.19.0
pip3 install python-docx>=1.0.0
pip3 install pdfplumber>=0.10.0
pip3 install pandas>=2.0.0
pip3 install matplotlib>=3.7.0
pip3 install reportlab>=4.0.0
pip3 install tiktoken>=0.5.0
pip3 install nltk>=3.8.0
pip3 install scikit-learn>=1.3.0
```
#### 2. Set Environment Variables
**Required:**
```bash
export HUGGINGFACE_TOKEN="your_hf_token_here"
```
**Optional (for LM Studio):**
```bash
export USE_LMSTUDIO=True
export LM_STUDIO_URL="http://localhost:1234"
```
#### 3. Create Logs Directory
```bash
mkdir -p /home/john/TranscriptorEnhanced/logs
chmod 755 /home/john/TranscriptorEnhanced/logs
```
#### 4. Test Installation
```bash
# Test quote extraction
python3 test_quotes_simple.py
# Should output:
# βœ“ Quote extraction working
# βœ“ 39 quotes extracted from 2 transcripts
```
---
## Production Configuration
### Current Settings (Enterprise-Ready)
| Setting | Value | Purpose |
|---------|-------|---------|
| LLM_BACKEND | `hf_api` | HuggingFace Inference API |
| LLM_TIMEOUT | `60s` | Increased for longer generation |
| MAX_TOKENS_PER_REQUEST | `1500` | Enterprise narrative length |
| Temperature (Analysis) | `0.5` | Balanced creativity/accuracy |
| Temperature (Narrative) | `0.7` | More creative storytelling |
| Max Tokens (LM Studio) | `2500` | Full-length reports |
| Max Tokens (HF API) | `1500` | API limits |
### Model Selection
**Current Models:**
- **Analysis:** `microsoft/Phi-3-mini-4k-instruct` (HF API)
- **Narrative:** `mistralai/Mixtral-8x7B-Instruct-v0.1` (HF API)
**⚠️ Known Limitation:** Phi-3-mini has only 4K context window. For transcripts >3000 words, consider:
- Switching to Mixtral-8x7B for analysis (8K context)
- Using LM Studio with larger local models
- Implementing better chunking strategy
---
## Monitoring & Logging
### Log Files Generated
Each analysis session creates:
1. **Session Log:** `logs/session_YYYYMMDD_HHMMSS.log`
- Detailed timestamped events
- All processing steps
- Warnings and errors
2. **JSON Summary:** `logs/summary_YYYYMMDD_HHMMSS.json`
- Structured metrics
- Machine-readable
- For integration with monitoring tools
3. **Text Summary:** `logs/summary_YYYYMMDD_HHMMSS.txt`
- Human-readable summary
- Success rates
- Error details
### Metrics Tracked
**Per Session:**
- Transcripts processed / failed
- Success rate (%)
- Average processing time
- Quotes extracted
- Total session duration
- Error types and frequencies
**Per Transcript:**
- File name and type
- Quality score (0-1)
- Word count
- Processing time (seconds)
- Error details (if failed)
### Example Log Output
```
2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Session started: 20251020_153045
2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Processing started: HCP_Oncologist.txt | Type: HCP | Format: TXT
2025-10-20 15:31:12 | INFO | TranscriptorAI_20251020_153045 | Processing complete: HCP_Oncologist.txt | Quality: 0.95 | Words: 1847 | Time: 27.3s
2025-10-20 15:31:15 | INFO | TranscriptorAI_20251020_153045 | Quote extraction complete: 21 quotes | Top score: 1.00 | Themes: patient_management, prescribing, barriers, safety, diagnosis
2025-10-20 15:31:45 | INFO | TranscriptorAI_20251020_153045 | SESSION COMPLETE | Duration: 60.2s | Processed: 3 | Failed: 0 | Success Rate: 100.0%
```
---
## Performance Benchmarks
Based on testing with sample data:
| Operation | Time | Notes |
|-----------|------|-------|
| Single transcript processing | 25-35s | Depends on length |
| Quote extraction | 2-5s | Per transcript |
| Cross-transcript summary | 30-60s | For 3-10 transcripts |
| **Total for 3 transcripts** | **~2-3 minutes** | End-to-end |
**Bottlenecks:**
1. HuggingFace API latency (network dependent)
2. LLM generation time (model dependent)
3. Quote extraction (scales linearly)
**Optimizations:**
- Use LM Studio for faster local processing (if GPU available)
- Process transcripts in parallel (not yet implemented)
- Cache common analyses (not yet implemented)
---
## Error Handling
### Automatic Recovery
The system includes:
- **Retry logic:** 3 attempts with exponential backoff
- **Fallback:** HF API ↔ LM Studio switching
- **Graceful degradation:** Continue processing other transcripts if one fails
- **Emergency summaries:** Generated if LLM fails
### Common Errors & Solutions
**Error:** `ModuleNotFoundError: No module named 'docx'`
**Solution:** Install python-docx: `pip3 install python-docx`
**Error:** `HF API timeout`
**Solution:** Increase timeout in `app.py` line 25 or use LM Studio
**Error:** `No quotes extracted`
**Solution:** Check transcript formatting (needs speaker labels or quotation marks)
**Error:** `Token limit exceeded`
**Solution:** Already fixed - now using 1500-2500 tokens
---
## Security Considerations
### API Keys
- Store HuggingFace token in environment variables (NOT in code)
- Use secrets management for production (AWS Secrets Manager, HashiCorp Vault)
- Rotate tokens regularly
### Data Privacy
- Transcript data is **not** sent to external services except HF API for LLM calls
- Logs contain file names but **not** transcript content
- Consider HIPAA compliance if processing patient interviews
- Implement data retention policies for logs
### Access Control
- Restrict access to `/logs` directory
- Implement user authentication for Gradio UI (not currently included)
- Use HTTPS in production deployments
---
## Scaling Recommendations
### For 10-50 Transcripts/Day
**Current setup is sufficient**
- Single server deployment
- HuggingFace API with rate limiting
- Local log storage
### For 50-200 Transcripts/Day
**Recommended upgrades:**
- Deploy with multiple workers (Gunicorn)
- Implement Redis queue for job management
- Use dedicated LM Studio instance on GPU server
- Centralized logging (ELK stack, Datadog)
### For 200+ Transcripts/Day
**Enterprise infrastructure:**
- Kubernetes deployment with auto-scaling
- Separate microservices (extraction, analysis, reporting)
- Dedicated GPU cluster for LLM calls
- Cloud object storage (S3) for transcripts/reports
- Real-time monitoring dashboard
---
## Deployment Checklist
### Before Go-Live
- [ ] All dependencies installed (`pip3 install -r requirements.txt`)
- [ ] HuggingFace token configured
- [ ] Logs directory created with proper permissions
- [ ] Test with 3-5 real client transcripts
- [ ] Review generated reports for quality
- [ ] Verify quote extraction working (check console output)
- [ ] Set up log monitoring/alerts
- [ ] Document any client-specific customizations
### Day 1 Production
- [ ] Start with 1-2 small client projects
- [ ] Monitor logs actively (`tail -f logs/session_*.log`)
- [ ] Verify session summaries being generated
- [ ] Track processing times vs. benchmarks
- [ ] Gather client feedback on report quality
### Week 1 Production
- [ ] Review all session logs
- [ ] Calculate average success rate (target: >95%)
- [ ] Identify common errors
- [ ] Optimize based on bottlenecks
- [ ] Update documentation with learnings
---
## Support & Maintenance
### Daily Monitoring
Check these metrics daily:
- Success rate (should be >95%)
- Average processing time (should be <3 minutes for 3 transcripts)
- Error frequency (should be <5%)
- Quote extraction quality (top scores should be >0.75)
### Weekly Maintenance
- Review session summary logs
- Clean up old logs (keep last 30 days)
- Update dependencies if security patches available
- Review client feedback
### Monthly Review
- Analyze performance trends
- Plan optimization improvements
- Update models if better ones available
- Review and update documentation
---
## Troubleshooting
### Low Success Rate (<90%)
**Possible Causes:**
- HuggingFace API rate limiting
- Network connectivity issues
- Malformed transcript files
**Actions:**
1. Check `logs/` for error patterns
2. Verify HF token is valid
3. Test with sample data
4. Consider switching to LM Studio
### Slow Processing (>5 minutes for 3 transcripts)
**Possible Causes:**
- Network latency to HF API
- Large transcript files
- Token limits causing retries
**Actions:**
1. Check network latency: `ping api.huggingface.co`
2. Review performance logs for bottlenecks
3. Consider local LM Studio deployment
4. Implement caching (future enhancement)
### Poor Quote Quality (scores <0.50)
**Possible Causes:**
- Transcripts lack specific details
- No quotation marks or speaker labels
- Very technical/clinical language
**Actions:**
1. Run `test_quotes_simple.py` with problematic transcript
2. Adjust scoring weights in `quote_extractor.py`
3. Add custom patterns for your transcript format
4. Accept that some transcripts naturally have fewer good quotes
---
## Future Enhancements
**High Priority (Next 3 Months):**
1. Upgrade to larger context model (Mixtral-8x7B for all operations)
2. Parallel transcript processing
3. User authentication for Gradio UI
4. Real-time monitoring dashboard
**Medium Priority (3-6 Months):**
5. Caching layer for common analyses
6. Batch processing API
7. Client-specific customization templates
8. Enhanced error recovery
**Low Priority (6-12 Months):**
9. Multi-language support
10. Audio timestamp integration
11. Interactive HTML reports
12. A/B testing framework
---
## Contact & Support
**Documentation:**
- Technical: `MARKET_RESEARCH_ENHANCEMENTS.md`
- User Guide: `STORYTELLING_QUICK_START.md`
- This Guide: `ENTERPRISE_DEPLOYMENT_GUIDE.md`
**Key Files:**
- Logging: `production_logger.py`
- Main App: `app.py`
- Quote Extraction: `quote_extractor.py`
- Narrative Generation: `story_writer.py`
**Logs Location:** `/home/john/TranscriptorEnhanced/logs/`
---
## Summary
βœ… **Token Limits:** Increased to 1500-2500 (enterprise-ready)
βœ… **Logging:** Full production monitoring implemented
βœ… **Dependencies:** Documented in requirements.txt
⚠️ **Still Todo (requires production environment):**
- Install python-docx (needs pip in environment)
- Test with 20+ real transcripts
- Set up centralized log monitoring
- Implement user authentication
**Status:** Ready for controlled production pilot with close monitoring
---
**Last Updated:** October 20, 2025
**Version:** 3.0-Enterprise