# Enterprise Deployment Guide **TranscriptorAI v3.0 - Market Research Edition** **Updated:** October 20, 2025 --- ## Pre-Deployment Checklist ### Required Changes (Completed ✅) - [x] **Token Limits Increased** - From: 100 tokens → To: 1500-2500 tokens - Files: `app.py`, `llm.py`, `story_writer.py` - Impact: Enables comprehensive market research narratives - [x] **Production Logging Implemented** - New file: `production_logger.py` - Integrated into: `app.py` - Features: Session tracking, performance metrics, error logging, export to JSON/TXT - [x] **Dependencies Documented** - File: `requirements.txt` - Key requirement: `python-docx>=1.0.0` for DOCX support ### Installation Steps #### 1. Install Dependencies ```bash cd /home/john/TranscriptorEnhanced # Install all required packages pip3 install -r requirements.txt # Or install individually: pip3 install gradio>=4.0.0 pip3 install huggingface_hub>=0.19.0 pip3 install python-docx>=1.0.0 pip3 install pdfplumber>=0.10.0 pip3 install pandas>=2.0.0 pip3 install matplotlib>=3.7.0 pip3 install reportlab>=4.0.0 pip3 install tiktoken>=0.5.0 pip3 install nltk>=3.8.0 pip3 install scikit-learn>=1.3.0 ``` #### 2. Set Environment Variables **Required:** ```bash export HUGGINGFACE_TOKEN="your_hf_token_here" ``` **Optional (for LM Studio):** ```bash export USE_LMSTUDIO=True export LM_STUDIO_URL="http://localhost:1234" ``` #### 3. Create Logs Directory ```bash mkdir -p /home/john/TranscriptorEnhanced/logs chmod 755 /home/john/TranscriptorEnhanced/logs ``` #### 4. Test Installation ```bash # Test quote extraction python3 test_quotes_simple.py # Should output: # ✓ Quote extraction working # ✓ 39 quotes extracted from 2 transcripts ``` --- ## Production Configuration ### Current Settings (Enterprise-Ready) | Setting | Value | Purpose | |---------|-------|---------| | LLM_BACKEND | `hf_api` | HuggingFace Inference API | | LLM_TIMEOUT | `60s` | Increased for longer generation | | MAX_TOKENS_PER_REQUEST | `1500` | Enterprise narrative length | | Temperature (Analysis) | `0.5` | Balanced creativity/accuracy | | Temperature (Narrative) | `0.7` | More creative storytelling | | Max Tokens (LM Studio) | `2500` | Full-length reports | | Max Tokens (HF API) | `1500` | API limits | ### Model Selection **Current Models:** - **Analysis:** `microsoft/Phi-3-mini-4k-instruct` (HF API) - **Narrative:** `mistralai/Mixtral-8x7B-Instruct-v0.1` (HF API) **⚠️ Known Limitation:** Phi-3-mini has only 4K context window. For transcripts >3000 words, consider: - Switching to Mixtral-8x7B for analysis (8K context) - Using LM Studio with larger local models - Implementing better chunking strategy --- ## Monitoring & Logging ### Log Files Generated Each analysis session creates: 1. **Session Log:** `logs/session_YYYYMMDD_HHMMSS.log` - Detailed timestamped events - All processing steps - Warnings and errors 2. **JSON Summary:** `logs/summary_YYYYMMDD_HHMMSS.json` - Structured metrics - Machine-readable - For integration with monitoring tools 3. **Text Summary:** `logs/summary_YYYYMMDD_HHMMSS.txt` - Human-readable summary - Success rates - Error details ### Metrics Tracked **Per Session:** - Transcripts processed / failed - Success rate (%) - Average processing time - Quotes extracted - Total session duration - Error types and frequencies **Per Transcript:** - File name and type - Quality score (0-1) - Word count - Processing time (seconds) - Error details (if failed) ### Example Log Output ``` 2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Session started: 20251020_153045 2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Processing started: HCP_Oncologist.txt | Type: HCP | Format: TXT 2025-10-20 15:31:12 | INFO | TranscriptorAI_20251020_153045 | Processing complete: HCP_Oncologist.txt | Quality: 0.95 | Words: 1847 | Time: 27.3s 2025-10-20 15:31:15 | INFO | TranscriptorAI_20251020_153045 | Quote extraction complete: 21 quotes | Top score: 1.00 | Themes: patient_management, prescribing, barriers, safety, diagnosis 2025-10-20 15:31:45 | INFO | TranscriptorAI_20251020_153045 | SESSION COMPLETE | Duration: 60.2s | Processed: 3 | Failed: 0 | Success Rate: 100.0% ``` --- ## Performance Benchmarks Based on testing with sample data: | Operation | Time | Notes | |-----------|------|-------| | Single transcript processing | 25-35s | Depends on length | | Quote extraction | 2-5s | Per transcript | | Cross-transcript summary | 30-60s | For 3-10 transcripts | | **Total for 3 transcripts** | **~2-3 minutes** | End-to-end | **Bottlenecks:** 1. HuggingFace API latency (network dependent) 2. LLM generation time (model dependent) 3. Quote extraction (scales linearly) **Optimizations:** - Use LM Studio for faster local processing (if GPU available) - Process transcripts in parallel (not yet implemented) - Cache common analyses (not yet implemented) --- ## Error Handling ### Automatic Recovery The system includes: - **Retry logic:** 3 attempts with exponential backoff - **Fallback:** HF API ↔ LM Studio switching - **Graceful degradation:** Continue processing other transcripts if one fails - **Emergency summaries:** Generated if LLM fails ### Common Errors & Solutions **Error:** `ModuleNotFoundError: No module named 'docx'` **Solution:** Install python-docx: `pip3 install python-docx` **Error:** `HF API timeout` **Solution:** Increase timeout in `app.py` line 25 or use LM Studio **Error:** `No quotes extracted` **Solution:** Check transcript formatting (needs speaker labels or quotation marks) **Error:** `Token limit exceeded` **Solution:** Already fixed - now using 1500-2500 tokens --- ## Security Considerations ### API Keys - Store HuggingFace token in environment variables (NOT in code) - Use secrets management for production (AWS Secrets Manager, HashiCorp Vault) - Rotate tokens regularly ### Data Privacy - Transcript data is **not** sent to external services except HF API for LLM calls - Logs contain file names but **not** transcript content - Consider HIPAA compliance if processing patient interviews - Implement data retention policies for logs ### Access Control - Restrict access to `/logs` directory - Implement user authentication for Gradio UI (not currently included) - Use HTTPS in production deployments --- ## Scaling Recommendations ### For 10-50 Transcripts/Day **Current setup is sufficient** - Single server deployment - HuggingFace API with rate limiting - Local log storage ### For 50-200 Transcripts/Day **Recommended upgrades:** - Deploy with multiple workers (Gunicorn) - Implement Redis queue for job management - Use dedicated LM Studio instance on GPU server - Centralized logging (ELK stack, Datadog) ### For 200+ Transcripts/Day **Enterprise infrastructure:** - Kubernetes deployment with auto-scaling - Separate microservices (extraction, analysis, reporting) - Dedicated GPU cluster for LLM calls - Cloud object storage (S3) for transcripts/reports - Real-time monitoring dashboard --- ## Deployment Checklist ### Before Go-Live - [ ] All dependencies installed (`pip3 install -r requirements.txt`) - [ ] HuggingFace token configured - [ ] Logs directory created with proper permissions - [ ] Test with 3-5 real client transcripts - [ ] Review generated reports for quality - [ ] Verify quote extraction working (check console output) - [ ] Set up log monitoring/alerts - [ ] Document any client-specific customizations ### Day 1 Production - [ ] Start with 1-2 small client projects - [ ] Monitor logs actively (`tail -f logs/session_*.log`) - [ ] Verify session summaries being generated - [ ] Track processing times vs. benchmarks - [ ] Gather client feedback on report quality ### Week 1 Production - [ ] Review all session logs - [ ] Calculate average success rate (target: >95%) - [ ] Identify common errors - [ ] Optimize based on bottlenecks - [ ] Update documentation with learnings --- ## Support & Maintenance ### Daily Monitoring Check these metrics daily: - Success rate (should be >95%) - Average processing time (should be <3 minutes for 3 transcripts) - Error frequency (should be <5%) - Quote extraction quality (top scores should be >0.75) ### Weekly Maintenance - Review session summary logs - Clean up old logs (keep last 30 days) - Update dependencies if security patches available - Review client feedback ### Monthly Review - Analyze performance trends - Plan optimization improvements - Update models if better ones available - Review and update documentation --- ## Troubleshooting ### Low Success Rate (<90%) **Possible Causes:** - HuggingFace API rate limiting - Network connectivity issues - Malformed transcript files **Actions:** 1. Check `logs/` for error patterns 2. Verify HF token is valid 3. Test with sample data 4. Consider switching to LM Studio ### Slow Processing (>5 minutes for 3 transcripts) **Possible Causes:** - Network latency to HF API - Large transcript files - Token limits causing retries **Actions:** 1. Check network latency: `ping api.huggingface.co` 2. Review performance logs for bottlenecks 3. Consider local LM Studio deployment 4. Implement caching (future enhancement) ### Poor Quote Quality (scores <0.50) **Possible Causes:** - Transcripts lack specific details - No quotation marks or speaker labels - Very technical/clinical language **Actions:** 1. Run `test_quotes_simple.py` with problematic transcript 2. Adjust scoring weights in `quote_extractor.py` 3. Add custom patterns for your transcript format 4. Accept that some transcripts naturally have fewer good quotes --- ## Future Enhancements **High Priority (Next 3 Months):** 1. Upgrade to larger context model (Mixtral-8x7B for all operations) 2. Parallel transcript processing 3. User authentication for Gradio UI 4. Real-time monitoring dashboard **Medium Priority (3-6 Months):** 5. Caching layer for common analyses 6. Batch processing API 7. Client-specific customization templates 8. Enhanced error recovery **Low Priority (6-12 Months):** 9. Multi-language support 10. Audio timestamp integration 11. Interactive HTML reports 12. A/B testing framework --- ## Contact & Support **Documentation:** - Technical: `MARKET_RESEARCH_ENHANCEMENTS.md` - User Guide: `STORYTELLING_QUICK_START.md` - This Guide: `ENTERPRISE_DEPLOYMENT_GUIDE.md` **Key Files:** - Logging: `production_logger.py` - Main App: `app.py` - Quote Extraction: `quote_extractor.py` - Narrative Generation: `story_writer.py` **Logs Location:** `/home/john/TranscriptorEnhanced/logs/` --- ## Summary ✅ **Token Limits:** Increased to 1500-2500 (enterprise-ready) ✅ **Logging:** Full production monitoring implemented ✅ **Dependencies:** Documented in requirements.txt ⚠️ **Still Todo (requires production environment):** - Install python-docx (needs pip in environment) - Test with 20+ real transcripts - Set up centralized log monitoring - Implement user authentication **Status:** Ready for controlled production pilot with close monitoring --- **Last Updated:** October 20, 2025 **Version:** 3.0-Enterprise