Spaces:

empirenexus
/

TranscriptWriting

Sleeping

App Files Files Community

TranscriptWriting / ENTERPRISE_DEPLOYMENT_GUIDE.md

jmisak

Upload 57 files

52d0298 verified 2 months ago

preview code

raw

history blame contribute delete

11.5 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Enterprise Deployment Guide

TranscriptorAI v3.0 - Market Research Edition Updated: October 20, 2025

Pre-Deployment Checklist

Required Changes (Completed ✅)

Token Limits Increased
- From: 100 tokens → To: 1500-2500 tokens
- Files: app.py, llm.py, story_writer.py
- Impact: Enables comprehensive market research narratives
Production Logging Implemented
- New file: production_logger.py
- Integrated into: app.py
- Features: Session tracking, performance metrics, error logging, export to JSON/TXT
Dependencies Documented
- File: requirements.txt
- Key requirement: python-docx>=1.0.0 for DOCX support

Installation Steps

1. Install Dependencies

cd /home/john/TranscriptorEnhanced

# Install all required packages
pip3 install -r requirements.txt

# Or install individually:
pip3 install gradio>=4.0.0
pip3 install huggingface_hub>=0.19.0
pip3 install python-docx>=1.0.0
pip3 install pdfplumber>=0.10.0
pip3 install pandas>=2.0.0
pip3 install matplotlib>=3.7.0
pip3 install reportlab>=4.0.0
pip3 install tiktoken>=0.5.0
pip3 install nltk>=3.8.0
pip3 install scikit-learn>=1.3.0

2. Set Environment Variables

Required:

export HUGGINGFACE_TOKEN="your_hf_token_here"

Optional (for LM Studio):

export USE_LMSTUDIO=True
export LM_STUDIO_URL="http://localhost:1234"

3. Create Logs Directory

mkdir -p /home/john/TranscriptorEnhanced/logs
chmod 755 /home/john/TranscriptorEnhanced/logs

4. Test Installation

# Test quote extraction
python3 test_quotes_simple.py

# Should output:
# ✓ Quote extraction working
# ✓ 39 quotes extracted from 2 transcripts

Production Configuration

Current Settings (Enterprise-Ready)

Setting	Value	Purpose
LLM_BACKEND	`hf_api`	HuggingFace Inference API
LLM_TIMEOUT	`60s`	Increased for longer generation
MAX_TOKENS_PER_REQUEST	`1500`	Enterprise narrative length
Temperature (Analysis)	`0.5`	Balanced creativity/accuracy
Temperature (Narrative)	`0.7`	More creative storytelling
Max Tokens (LM Studio)	`2500`	Full-length reports
Max Tokens (HF API)	`1500`	API limits

Model Selection

Current Models:

Analysis: microsoft/Phi-3-mini-4k-instruct (HF API)
Narrative: mistralai/Mixtral-8x7B-Instruct-v0.1 (HF API)

⚠️ Known Limitation: Phi-3-mini has only 4K context window. For transcripts >3000 words, consider:

Switching to Mixtral-8x7B for analysis (8K context)
Using LM Studio with larger local models
Implementing better chunking strategy

Monitoring & Logging

Log Files Generated

Each analysis session creates:

Session Log: logs/session_YYYYMMDD_HHMMSS.log
- Detailed timestamped events
- All processing steps
- Warnings and errors
JSON Summary: logs/summary_YYYYMMDD_HHMMSS.json
- Structured metrics
- Machine-readable
- For integration with monitoring tools
Text Summary: logs/summary_YYYYMMDD_HHMMSS.txt
- Human-readable summary
- Success rates
- Error details

Metrics Tracked

Per Session:

Transcripts processed / failed
Success rate (%)
Average processing time
Quotes extracted
Total session duration
Error types and frequencies

Per Transcript:

File name and type
Quality score (0-1)
Word count
Processing time (seconds)
Error details (if failed)

Example Log Output

2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Session started: 20251020_153045
2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Processing started: HCP_Oncologist.txt | Type: HCP | Format: TXT
2025-10-20 15:31:12 | INFO | TranscriptorAI_20251020_153045 | Processing complete: HCP_Oncologist.txt | Quality: 0.95 | Words: 1847 | Time: 27.3s
2025-10-20 15:31:15 | INFO | TranscriptorAI_20251020_153045 | Quote extraction complete: 21 quotes | Top score: 1.00 | Themes: patient_management, prescribing, barriers, safety, diagnosis
2025-10-20 15:31:45 | INFO | TranscriptorAI_20251020_153045 | SESSION COMPLETE | Duration: 60.2s | Processed: 3 | Failed: 0 | Success Rate: 100.0%

Performance Benchmarks

Based on testing with sample data:

Operation	Time	Notes
Single transcript processing	25-35s	Depends on length
Quote extraction	2-5s	Per transcript
Cross-transcript summary	30-60s	For 3-10 transcripts
Total for 3 transcripts	~2-3 minutes	End-to-end

Bottlenecks:

HuggingFace API latency (network dependent)
LLM generation time (model dependent)
Quote extraction (scales linearly)

Optimizations:

Use LM Studio for faster local processing (if GPU available)
Process transcripts in parallel (not yet implemented)
Cache common analyses (not yet implemented)

Error Handling

Automatic Recovery

The system includes:

Retry logic: 3 attempts with exponential backoff
Fallback: HF API ↔ LM Studio switching
Graceful degradation: Continue processing other transcripts if one fails
Emergency summaries: Generated if LLM fails

Common Errors & Solutions

Error: ModuleNotFoundError: No module named 'docx' Solution: Install python-docx: pip3 install python-docx

Error: HF API timeout Solution: Increase timeout in app.py line 25 or use LM Studio

Error: No quotes extracted Solution: Check transcript formatting (needs speaker labels or quotation marks)

Error: Token limit exceeded Solution: Already fixed - now using 1500-2500 tokens

Security Considerations

API Keys

Store HuggingFace token in environment variables (NOT in code)
Use secrets management for production (AWS Secrets Manager, HashiCorp Vault)
Rotate tokens regularly

Data Privacy

Transcript data is not sent to external services except HF API for LLM calls
Logs contain file names but not transcript content
Consider HIPAA compliance if processing patient interviews
Implement data retention policies for logs

Access Control

Restrict access to /logs directory
Implement user authentication for Gradio UI (not currently included)
Use HTTPS in production deployments

Scaling Recommendations

For 10-50 Transcripts/Day

Current setup is sufficient

Single server deployment
HuggingFace API with rate limiting
Local log storage

For 50-200 Transcripts/Day

Recommended upgrades:

Deploy with multiple workers (Gunicorn)
Implement Redis queue for job management
Use dedicated LM Studio instance on GPU server
Centralized logging (ELK stack, Datadog)

For 200+ Transcripts/Day

Enterprise infrastructure:

Kubernetes deployment with auto-scaling
Separate microservices (extraction, analysis, reporting)
Dedicated GPU cluster for LLM calls
Cloud object storage (S3) for transcripts/reports
Real-time monitoring dashboard

Deployment Checklist

Before Go-Live

All dependencies installed (pip3 install -r requirements.txt)
HuggingFace token configured
Logs directory created with proper permissions
Test with 3-5 real client transcripts
Review generated reports for quality
Verify quote extraction working (check console output)
Set up log monitoring/alerts
Document any client-specific customizations

Day 1 Production

Start with 1-2 small client projects
Monitor logs actively (tail -f logs/session_*.log)
Verify session summaries being generated
Track processing times vs. benchmarks
Gather client feedback on report quality

Week 1 Production

Review all session logs
Calculate average success rate (target: >95%)
Identify common errors
Optimize based on bottlenecks
Update documentation with learnings

Support & Maintenance

Daily Monitoring

Check these metrics daily:

Success rate (should be >95%)
Average processing time (should be <3 minutes for 3 transcripts)
Error frequency (should be <5%)
Quote extraction quality (top scores should be >0.75)

Weekly Maintenance

Review session summary logs
Clean up old logs (keep last 30 days)
Update dependencies if security patches available
Review client feedback

Monthly Review

Analyze performance trends
Plan optimization improvements
Update models if better ones available
Review and update documentation

Troubleshooting

Low Success Rate (<90%)

Possible Causes:

HuggingFace API rate limiting
Network connectivity issues
Malformed transcript files

Actions:

Check logs/ for error patterns
Verify HF token is valid
Test with sample data
Consider switching to LM Studio

Slow Processing (>5 minutes for 3 transcripts)

Possible Causes:

Network latency to HF API
Large transcript files
Token limits causing retries

Actions:

Check network latency: ping api.huggingface.co
Review performance logs for bottlenecks
Consider local LM Studio deployment
Implement caching (future enhancement)

Poor Quote Quality (scores <0.50)

Possible Causes:

Transcripts lack specific details
No quotation marks or speaker labels
Very technical/clinical language

Actions:

Run test_quotes_simple.py with problematic transcript
Adjust scoring weights in quote_extractor.py
Add custom patterns for your transcript format
Accept that some transcripts naturally have fewer good quotes

Future Enhancements

High Priority (Next 3 Months):

Upgrade to larger context model (Mixtral-8x7B for all operations)
Parallel transcript processing
User authentication for Gradio UI
Real-time monitoring dashboard

Medium Priority (3-6 Months): 5. Caching layer for common analyses 6. Batch processing API 7. Client-specific customization templates 8. Enhanced error recovery

Low Priority (6-12 Months): 9. Multi-language support 10. Audio timestamp integration 11. Interactive HTML reports 12. A/B testing framework

Contact & Support

Documentation:

Technical: MARKET_RESEARCH_ENHANCEMENTS.md
User Guide: STORYTELLING_QUICK_START.md
This Guide: ENTERPRISE_DEPLOYMENT_GUIDE.md

Key Files:

Logging: production_logger.py
Main App: app.py
Quote Extraction: quote_extractor.py
Narrative Generation: story_writer.py

Logs Location: /home/john/TranscriptorEnhanced/logs/

Summary

✅ Token Limits: Increased to 1500-2500 (enterprise-ready) ✅ Logging: Full production monitoring implemented ✅ Dependencies: Documented in requirements.txt

⚠️ Still Todo (requires production environment):

Install python-docx (needs pip in environment)
Test with 20+ real transcripts
Set up centralized log monitoring
Implement user authentication

Status: Ready for controlled production pilot with close monitoring

Last Updated: October 20, 2025 Version: 3.0-Enterprise