TranscriptWriting / ENTERPRISE_DEPLOYMENT_GUIDE.md
jmisak's picture
Upload 57 files
52d0298 verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Enterprise Deployment Guide

TranscriptorAI v3.0 - Market Research Edition Updated: October 20, 2025


Pre-Deployment Checklist

Required Changes (Completed βœ…)

  • Token Limits Increased

    • From: 100 tokens β†’ To: 1500-2500 tokens
    • Files: app.py, llm.py, story_writer.py
    • Impact: Enables comprehensive market research narratives
  • Production Logging Implemented

    • New file: production_logger.py
    • Integrated into: app.py
    • Features: Session tracking, performance metrics, error logging, export to JSON/TXT
  • Dependencies Documented

    • File: requirements.txt
    • Key requirement: python-docx>=1.0.0 for DOCX support

Installation Steps

1. Install Dependencies

cd /home/john/TranscriptorEnhanced

# Install all required packages
pip3 install -r requirements.txt

# Or install individually:
pip3 install gradio>=4.0.0
pip3 install huggingface_hub>=0.19.0
pip3 install python-docx>=1.0.0
pip3 install pdfplumber>=0.10.0
pip3 install pandas>=2.0.0
pip3 install matplotlib>=3.7.0
pip3 install reportlab>=4.0.0
pip3 install tiktoken>=0.5.0
pip3 install nltk>=3.8.0
pip3 install scikit-learn>=1.3.0

2. Set Environment Variables

Required:

export HUGGINGFACE_TOKEN="your_hf_token_here"

Optional (for LM Studio):

export USE_LMSTUDIO=True
export LM_STUDIO_URL="http://localhost:1234"

3. Create Logs Directory

mkdir -p /home/john/TranscriptorEnhanced/logs
chmod 755 /home/john/TranscriptorEnhanced/logs

4. Test Installation

# Test quote extraction
python3 test_quotes_simple.py

# Should output:
# βœ“ Quote extraction working
# βœ“ 39 quotes extracted from 2 transcripts

Production Configuration

Current Settings (Enterprise-Ready)

Setting Value Purpose
LLM_BACKEND hf_api HuggingFace Inference API
LLM_TIMEOUT 60s Increased for longer generation
MAX_TOKENS_PER_REQUEST 1500 Enterprise narrative length
Temperature (Analysis) 0.5 Balanced creativity/accuracy
Temperature (Narrative) 0.7 More creative storytelling
Max Tokens (LM Studio) 2500 Full-length reports
Max Tokens (HF API) 1500 API limits

Model Selection

Current Models:

  • Analysis: microsoft/Phi-3-mini-4k-instruct (HF API)
  • Narrative: mistralai/Mixtral-8x7B-Instruct-v0.1 (HF API)

⚠️ Known Limitation: Phi-3-mini has only 4K context window. For transcripts >3000 words, consider:

  • Switching to Mixtral-8x7B for analysis (8K context)
  • Using LM Studio with larger local models
  • Implementing better chunking strategy

Monitoring & Logging

Log Files Generated

Each analysis session creates:

  1. Session Log: logs/session_YYYYMMDD_HHMMSS.log

    • Detailed timestamped events
    • All processing steps
    • Warnings and errors
  2. JSON Summary: logs/summary_YYYYMMDD_HHMMSS.json

    • Structured metrics
    • Machine-readable
    • For integration with monitoring tools
  3. Text Summary: logs/summary_YYYYMMDD_HHMMSS.txt

    • Human-readable summary
    • Success rates
    • Error details

Metrics Tracked

Per Session:

  • Transcripts processed / failed
  • Success rate (%)
  • Average processing time
  • Quotes extracted
  • Total session duration
  • Error types and frequencies

Per Transcript:

  • File name and type
  • Quality score (0-1)
  • Word count
  • Processing time (seconds)
  • Error details (if failed)

Example Log Output

2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Session started: 20251020_153045
2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Processing started: HCP_Oncologist.txt | Type: HCP | Format: TXT
2025-10-20 15:31:12 | INFO | TranscriptorAI_20251020_153045 | Processing complete: HCP_Oncologist.txt | Quality: 0.95 | Words: 1847 | Time: 27.3s
2025-10-20 15:31:15 | INFO | TranscriptorAI_20251020_153045 | Quote extraction complete: 21 quotes | Top score: 1.00 | Themes: patient_management, prescribing, barriers, safety, diagnosis
2025-10-20 15:31:45 | INFO | TranscriptorAI_20251020_153045 | SESSION COMPLETE | Duration: 60.2s | Processed: 3 | Failed: 0 | Success Rate: 100.0%

Performance Benchmarks

Based on testing with sample data:

Operation Time Notes
Single transcript processing 25-35s Depends on length
Quote extraction 2-5s Per transcript
Cross-transcript summary 30-60s For 3-10 transcripts
Total for 3 transcripts ~2-3 minutes End-to-end

Bottlenecks:

  1. HuggingFace API latency (network dependent)
  2. LLM generation time (model dependent)
  3. Quote extraction (scales linearly)

Optimizations:

  • Use LM Studio for faster local processing (if GPU available)
  • Process transcripts in parallel (not yet implemented)
  • Cache common analyses (not yet implemented)

Error Handling

Automatic Recovery

The system includes:

  • Retry logic: 3 attempts with exponential backoff
  • Fallback: HF API ↔ LM Studio switching
  • Graceful degradation: Continue processing other transcripts if one fails
  • Emergency summaries: Generated if LLM fails

Common Errors & Solutions

Error: ModuleNotFoundError: No module named 'docx' Solution: Install python-docx: pip3 install python-docx

Error: HF API timeout Solution: Increase timeout in app.py line 25 or use LM Studio

Error: No quotes extracted Solution: Check transcript formatting (needs speaker labels or quotation marks)

Error: Token limit exceeded Solution: Already fixed - now using 1500-2500 tokens


Security Considerations

API Keys

  • Store HuggingFace token in environment variables (NOT in code)
  • Use secrets management for production (AWS Secrets Manager, HashiCorp Vault)
  • Rotate tokens regularly

Data Privacy

  • Transcript data is not sent to external services except HF API for LLM calls
  • Logs contain file names but not transcript content
  • Consider HIPAA compliance if processing patient interviews
  • Implement data retention policies for logs

Access Control

  • Restrict access to /logs directory
  • Implement user authentication for Gradio UI (not currently included)
  • Use HTTPS in production deployments

Scaling Recommendations

For 10-50 Transcripts/Day

Current setup is sufficient

  • Single server deployment
  • HuggingFace API with rate limiting
  • Local log storage

For 50-200 Transcripts/Day

Recommended upgrades:

  • Deploy with multiple workers (Gunicorn)
  • Implement Redis queue for job management
  • Use dedicated LM Studio instance on GPU server
  • Centralized logging (ELK stack, Datadog)

For 200+ Transcripts/Day

Enterprise infrastructure:

  • Kubernetes deployment with auto-scaling
  • Separate microservices (extraction, analysis, reporting)
  • Dedicated GPU cluster for LLM calls
  • Cloud object storage (S3) for transcripts/reports
  • Real-time monitoring dashboard

Deployment Checklist

Before Go-Live

  • All dependencies installed (pip3 install -r requirements.txt)
  • HuggingFace token configured
  • Logs directory created with proper permissions
  • Test with 3-5 real client transcripts
  • Review generated reports for quality
  • Verify quote extraction working (check console output)
  • Set up log monitoring/alerts
  • Document any client-specific customizations

Day 1 Production

  • Start with 1-2 small client projects
  • Monitor logs actively (tail -f logs/session_*.log)
  • Verify session summaries being generated
  • Track processing times vs. benchmarks
  • Gather client feedback on report quality

Week 1 Production

  • Review all session logs
  • Calculate average success rate (target: >95%)
  • Identify common errors
  • Optimize based on bottlenecks
  • Update documentation with learnings

Support & Maintenance

Daily Monitoring

Check these metrics daily:

  • Success rate (should be >95%)
  • Average processing time (should be <3 minutes for 3 transcripts)
  • Error frequency (should be <5%)
  • Quote extraction quality (top scores should be >0.75)

Weekly Maintenance

  • Review session summary logs
  • Clean up old logs (keep last 30 days)
  • Update dependencies if security patches available
  • Review client feedback

Monthly Review

  • Analyze performance trends
  • Plan optimization improvements
  • Update models if better ones available
  • Review and update documentation

Troubleshooting

Low Success Rate (<90%)

Possible Causes:

  • HuggingFace API rate limiting
  • Network connectivity issues
  • Malformed transcript files

Actions:

  1. Check logs/ for error patterns
  2. Verify HF token is valid
  3. Test with sample data
  4. Consider switching to LM Studio

Slow Processing (>5 minutes for 3 transcripts)

Possible Causes:

  • Network latency to HF API
  • Large transcript files
  • Token limits causing retries

Actions:

  1. Check network latency: ping api.huggingface.co
  2. Review performance logs for bottlenecks
  3. Consider local LM Studio deployment
  4. Implement caching (future enhancement)

Poor Quote Quality (scores <0.50)

Possible Causes:

  • Transcripts lack specific details
  • No quotation marks or speaker labels
  • Very technical/clinical language

Actions:

  1. Run test_quotes_simple.py with problematic transcript
  2. Adjust scoring weights in quote_extractor.py
  3. Add custom patterns for your transcript format
  4. Accept that some transcripts naturally have fewer good quotes

Future Enhancements

High Priority (Next 3 Months):

  1. Upgrade to larger context model (Mixtral-8x7B for all operations)
  2. Parallel transcript processing
  3. User authentication for Gradio UI
  4. Real-time monitoring dashboard

Medium Priority (3-6 Months): 5. Caching layer for common analyses 6. Batch processing API 7. Client-specific customization templates 8. Enhanced error recovery

Low Priority (6-12 Months): 9. Multi-language support 10. Audio timestamp integration 11. Interactive HTML reports 12. A/B testing framework


Contact & Support

Documentation:

  • Technical: MARKET_RESEARCH_ENHANCEMENTS.md
  • User Guide: STORYTELLING_QUICK_START.md
  • This Guide: ENTERPRISE_DEPLOYMENT_GUIDE.md

Key Files:

  • Logging: production_logger.py
  • Main App: app.py
  • Quote Extraction: quote_extractor.py
  • Narrative Generation: story_writer.py

Logs Location: /home/john/TranscriptorEnhanced/logs/


Summary

βœ… Token Limits: Increased to 1500-2500 (enterprise-ready) βœ… Logging: Full production monitoring implemented βœ… Dependencies: Documented in requirements.txt

⚠️ Still Todo (requires production environment):

  • Install python-docx (needs pip in environment)
  • Test with 20+ real transcripts
  • Set up centralized log monitoring
  • Implement user authentication

Status: Ready for controlled production pilot with close monitoring


Last Updated: October 20, 2025 Version: 3.0-Enterprise