Spaces:
Sleeping
Sleeping
| # Enterprise Deployment Guide | |
| **TranscriptorAI v3.0 - Market Research Edition** | |
| **Updated:** October 20, 2025 | |
| --- | |
| ## Pre-Deployment Checklist | |
| ### Required Changes (Completed β ) | |
| - [x] **Token Limits Increased** | |
| - From: 100 tokens β To: 1500-2500 tokens | |
| - Files: `app.py`, `llm.py`, `story_writer.py` | |
| - Impact: Enables comprehensive market research narratives | |
| - [x] **Production Logging Implemented** | |
| - New file: `production_logger.py` | |
| - Integrated into: `app.py` | |
| - Features: Session tracking, performance metrics, error logging, export to JSON/TXT | |
| - [x] **Dependencies Documented** | |
| - File: `requirements.txt` | |
| - Key requirement: `python-docx>=1.0.0` for DOCX support | |
| ### Installation Steps | |
| #### 1. Install Dependencies | |
| ```bash | |
| cd /home/john/TranscriptorEnhanced | |
| # Install all required packages | |
| pip3 install -r requirements.txt | |
| # Or install individually: | |
| pip3 install gradio>=4.0.0 | |
| pip3 install huggingface_hub>=0.19.0 | |
| pip3 install python-docx>=1.0.0 | |
| pip3 install pdfplumber>=0.10.0 | |
| pip3 install pandas>=2.0.0 | |
| pip3 install matplotlib>=3.7.0 | |
| pip3 install reportlab>=4.0.0 | |
| pip3 install tiktoken>=0.5.0 | |
| pip3 install nltk>=3.8.0 | |
| pip3 install scikit-learn>=1.3.0 | |
| ``` | |
| #### 2. Set Environment Variables | |
| **Required:** | |
| ```bash | |
| export HUGGINGFACE_TOKEN="your_hf_token_here" | |
| ``` | |
| **Optional (for LM Studio):** | |
| ```bash | |
| export USE_LMSTUDIO=True | |
| export LM_STUDIO_URL="http://localhost:1234" | |
| ``` | |
| #### 3. Create Logs Directory | |
| ```bash | |
| mkdir -p /home/john/TranscriptorEnhanced/logs | |
| chmod 755 /home/john/TranscriptorEnhanced/logs | |
| ``` | |
| #### 4. Test Installation | |
| ```bash | |
| # Test quote extraction | |
| python3 test_quotes_simple.py | |
| # Should output: | |
| # β Quote extraction working | |
| # β 39 quotes extracted from 2 transcripts | |
| ``` | |
| --- | |
| ## Production Configuration | |
| ### Current Settings (Enterprise-Ready) | |
| | Setting | Value | Purpose | | |
| |---------|-------|---------| | |
| | LLM_BACKEND | `hf_api` | HuggingFace Inference API | | |
| | LLM_TIMEOUT | `60s` | Increased for longer generation | | |
| | MAX_TOKENS_PER_REQUEST | `1500` | Enterprise narrative length | | |
| | Temperature (Analysis) | `0.5` | Balanced creativity/accuracy | | |
| | Temperature (Narrative) | `0.7` | More creative storytelling | | |
| | Max Tokens (LM Studio) | `2500` | Full-length reports | | |
| | Max Tokens (HF API) | `1500` | API limits | | |
| ### Model Selection | |
| **Current Models:** | |
| - **Analysis:** `microsoft/Phi-3-mini-4k-instruct` (HF API) | |
| - **Narrative:** `mistralai/Mixtral-8x7B-Instruct-v0.1` (HF API) | |
| **β οΈ Known Limitation:** Phi-3-mini has only 4K context window. For transcripts >3000 words, consider: | |
| - Switching to Mixtral-8x7B for analysis (8K context) | |
| - Using LM Studio with larger local models | |
| - Implementing better chunking strategy | |
| --- | |
| ## Monitoring & Logging | |
| ### Log Files Generated | |
| Each analysis session creates: | |
| 1. **Session Log:** `logs/session_YYYYMMDD_HHMMSS.log` | |
| - Detailed timestamped events | |
| - All processing steps | |
| - Warnings and errors | |
| 2. **JSON Summary:** `logs/summary_YYYYMMDD_HHMMSS.json` | |
| - Structured metrics | |
| - Machine-readable | |
| - For integration with monitoring tools | |
| 3. **Text Summary:** `logs/summary_YYYYMMDD_HHMMSS.txt` | |
| - Human-readable summary | |
| - Success rates | |
| - Error details | |
| ### Metrics Tracked | |
| **Per Session:** | |
| - Transcripts processed / failed | |
| - Success rate (%) | |
| - Average processing time | |
| - Quotes extracted | |
| - Total session duration | |
| - Error types and frequencies | |
| **Per Transcript:** | |
| - File name and type | |
| - Quality score (0-1) | |
| - Word count | |
| - Processing time (seconds) | |
| - Error details (if failed) | |
| ### Example Log Output | |
| ``` | |
| 2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Session started: 20251020_153045 | |
| 2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Processing started: HCP_Oncologist.txt | Type: HCP | Format: TXT | |
| 2025-10-20 15:31:12 | INFO | TranscriptorAI_20251020_153045 | Processing complete: HCP_Oncologist.txt | Quality: 0.95 | Words: 1847 | Time: 27.3s | |
| 2025-10-20 15:31:15 | INFO | TranscriptorAI_20251020_153045 | Quote extraction complete: 21 quotes | Top score: 1.00 | Themes: patient_management, prescribing, barriers, safety, diagnosis | |
| 2025-10-20 15:31:45 | INFO | TranscriptorAI_20251020_153045 | SESSION COMPLETE | Duration: 60.2s | Processed: 3 | Failed: 0 | Success Rate: 100.0% | |
| ``` | |
| --- | |
| ## Performance Benchmarks | |
| Based on testing with sample data: | |
| | Operation | Time | Notes | | |
| |-----------|------|-------| | |
| | Single transcript processing | 25-35s | Depends on length | | |
| | Quote extraction | 2-5s | Per transcript | | |
| | Cross-transcript summary | 30-60s | For 3-10 transcripts | | |
| | **Total for 3 transcripts** | **~2-3 minutes** | End-to-end | | |
| **Bottlenecks:** | |
| 1. HuggingFace API latency (network dependent) | |
| 2. LLM generation time (model dependent) | |
| 3. Quote extraction (scales linearly) | |
| **Optimizations:** | |
| - Use LM Studio for faster local processing (if GPU available) | |
| - Process transcripts in parallel (not yet implemented) | |
| - Cache common analyses (not yet implemented) | |
| --- | |
| ## Error Handling | |
| ### Automatic Recovery | |
| The system includes: | |
| - **Retry logic:** 3 attempts with exponential backoff | |
| - **Fallback:** HF API β LM Studio switching | |
| - **Graceful degradation:** Continue processing other transcripts if one fails | |
| - **Emergency summaries:** Generated if LLM fails | |
| ### Common Errors & Solutions | |
| **Error:** `ModuleNotFoundError: No module named 'docx'` | |
| **Solution:** Install python-docx: `pip3 install python-docx` | |
| **Error:** `HF API timeout` | |
| **Solution:** Increase timeout in `app.py` line 25 or use LM Studio | |
| **Error:** `No quotes extracted` | |
| **Solution:** Check transcript formatting (needs speaker labels or quotation marks) | |
| **Error:** `Token limit exceeded` | |
| **Solution:** Already fixed - now using 1500-2500 tokens | |
| --- | |
| ## Security Considerations | |
| ### API Keys | |
| - Store HuggingFace token in environment variables (NOT in code) | |
| - Use secrets management for production (AWS Secrets Manager, HashiCorp Vault) | |
| - Rotate tokens regularly | |
| ### Data Privacy | |
| - Transcript data is **not** sent to external services except HF API for LLM calls | |
| - Logs contain file names but **not** transcript content | |
| - Consider HIPAA compliance if processing patient interviews | |
| - Implement data retention policies for logs | |
| ### Access Control | |
| - Restrict access to `/logs` directory | |
| - Implement user authentication for Gradio UI (not currently included) | |
| - Use HTTPS in production deployments | |
| --- | |
| ## Scaling Recommendations | |
| ### For 10-50 Transcripts/Day | |
| **Current setup is sufficient** | |
| - Single server deployment | |
| - HuggingFace API with rate limiting | |
| - Local log storage | |
| ### For 50-200 Transcripts/Day | |
| **Recommended upgrades:** | |
| - Deploy with multiple workers (Gunicorn) | |
| - Implement Redis queue for job management | |
| - Use dedicated LM Studio instance on GPU server | |
| - Centralized logging (ELK stack, Datadog) | |
| ### For 200+ Transcripts/Day | |
| **Enterprise infrastructure:** | |
| - Kubernetes deployment with auto-scaling | |
| - Separate microservices (extraction, analysis, reporting) | |
| - Dedicated GPU cluster for LLM calls | |
| - Cloud object storage (S3) for transcripts/reports | |
| - Real-time monitoring dashboard | |
| --- | |
| ## Deployment Checklist | |
| ### Before Go-Live | |
| - [ ] All dependencies installed (`pip3 install -r requirements.txt`) | |
| - [ ] HuggingFace token configured | |
| - [ ] Logs directory created with proper permissions | |
| - [ ] Test with 3-5 real client transcripts | |
| - [ ] Review generated reports for quality | |
| - [ ] Verify quote extraction working (check console output) | |
| - [ ] Set up log monitoring/alerts | |
| - [ ] Document any client-specific customizations | |
| ### Day 1 Production | |
| - [ ] Start with 1-2 small client projects | |
| - [ ] Monitor logs actively (`tail -f logs/session_*.log`) | |
| - [ ] Verify session summaries being generated | |
| - [ ] Track processing times vs. benchmarks | |
| - [ ] Gather client feedback on report quality | |
| ### Week 1 Production | |
| - [ ] Review all session logs | |
| - [ ] Calculate average success rate (target: >95%) | |
| - [ ] Identify common errors | |
| - [ ] Optimize based on bottlenecks | |
| - [ ] Update documentation with learnings | |
| --- | |
| ## Support & Maintenance | |
| ### Daily Monitoring | |
| Check these metrics daily: | |
| - Success rate (should be >95%) | |
| - Average processing time (should be <3 minutes for 3 transcripts) | |
| - Error frequency (should be <5%) | |
| - Quote extraction quality (top scores should be >0.75) | |
| ### Weekly Maintenance | |
| - Review session summary logs | |
| - Clean up old logs (keep last 30 days) | |
| - Update dependencies if security patches available | |
| - Review client feedback | |
| ### Monthly Review | |
| - Analyze performance trends | |
| - Plan optimization improvements | |
| - Update models if better ones available | |
| - Review and update documentation | |
| --- | |
| ## Troubleshooting | |
| ### Low Success Rate (<90%) | |
| **Possible Causes:** | |
| - HuggingFace API rate limiting | |
| - Network connectivity issues | |
| - Malformed transcript files | |
| **Actions:** | |
| 1. Check `logs/` for error patterns | |
| 2. Verify HF token is valid | |
| 3. Test with sample data | |
| 4. Consider switching to LM Studio | |
| ### Slow Processing (>5 minutes for 3 transcripts) | |
| **Possible Causes:** | |
| - Network latency to HF API | |
| - Large transcript files | |
| - Token limits causing retries | |
| **Actions:** | |
| 1. Check network latency: `ping api.huggingface.co` | |
| 2. Review performance logs for bottlenecks | |
| 3. Consider local LM Studio deployment | |
| 4. Implement caching (future enhancement) | |
| ### Poor Quote Quality (scores <0.50) | |
| **Possible Causes:** | |
| - Transcripts lack specific details | |
| - No quotation marks or speaker labels | |
| - Very technical/clinical language | |
| **Actions:** | |
| 1. Run `test_quotes_simple.py` with problematic transcript | |
| 2. Adjust scoring weights in `quote_extractor.py` | |
| 3. Add custom patterns for your transcript format | |
| 4. Accept that some transcripts naturally have fewer good quotes | |
| --- | |
| ## Future Enhancements | |
| **High Priority (Next 3 Months):** | |
| 1. Upgrade to larger context model (Mixtral-8x7B for all operations) | |
| 2. Parallel transcript processing | |
| 3. User authentication for Gradio UI | |
| 4. Real-time monitoring dashboard | |
| **Medium Priority (3-6 Months):** | |
| 5. Caching layer for common analyses | |
| 6. Batch processing API | |
| 7. Client-specific customization templates | |
| 8. Enhanced error recovery | |
| **Low Priority (6-12 Months):** | |
| 9. Multi-language support | |
| 10. Audio timestamp integration | |
| 11. Interactive HTML reports | |
| 12. A/B testing framework | |
| --- | |
| ## Contact & Support | |
| **Documentation:** | |
| - Technical: `MARKET_RESEARCH_ENHANCEMENTS.md` | |
| - User Guide: `STORYTELLING_QUICK_START.md` | |
| - This Guide: `ENTERPRISE_DEPLOYMENT_GUIDE.md` | |
| **Key Files:** | |
| - Logging: `production_logger.py` | |
| - Main App: `app.py` | |
| - Quote Extraction: `quote_extractor.py` | |
| - Narrative Generation: `story_writer.py` | |
| **Logs Location:** `/home/john/TranscriptorEnhanced/logs/` | |
| --- | |
| ## Summary | |
| β **Token Limits:** Increased to 1500-2500 (enterprise-ready) | |
| β **Logging:** Full production monitoring implemented | |
| β **Dependencies:** Documented in requirements.txt | |
| β οΈ **Still Todo (requires production environment):** | |
| - Install python-docx (needs pip in environment) | |
| - Test with 20+ real transcripts | |
| - Set up centralized log monitoring | |
| - Implement user authentication | |
| **Status:** Ready for controlled production pilot with close monitoring | |
| --- | |
| **Last Updated:** October 20, 2025 | |
| **Version:** 3.0-Enterprise | |