Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
Enterprise Deployment Guide
TranscriptorAI v3.0 - Market Research Edition Updated: October 20, 2025
Pre-Deployment Checklist
Required Changes (Completed β )
Token Limits Increased
- From: 100 tokens β To: 1500-2500 tokens
- Files:
app.py,llm.py,story_writer.py - Impact: Enables comprehensive market research narratives
Production Logging Implemented
- New file:
production_logger.py - Integrated into:
app.py - Features: Session tracking, performance metrics, error logging, export to JSON/TXT
- New file:
Dependencies Documented
- File:
requirements.txt - Key requirement:
python-docx>=1.0.0for DOCX support
- File:
Installation Steps
1. Install Dependencies
cd /home/john/TranscriptorEnhanced
# Install all required packages
pip3 install -r requirements.txt
# Or install individually:
pip3 install gradio>=4.0.0
pip3 install huggingface_hub>=0.19.0
pip3 install python-docx>=1.0.0
pip3 install pdfplumber>=0.10.0
pip3 install pandas>=2.0.0
pip3 install matplotlib>=3.7.0
pip3 install reportlab>=4.0.0
pip3 install tiktoken>=0.5.0
pip3 install nltk>=3.8.0
pip3 install scikit-learn>=1.3.0
2. Set Environment Variables
Required:
export HUGGINGFACE_TOKEN="your_hf_token_here"
Optional (for LM Studio):
export USE_LMSTUDIO=True
export LM_STUDIO_URL="http://localhost:1234"
3. Create Logs Directory
mkdir -p /home/john/TranscriptorEnhanced/logs
chmod 755 /home/john/TranscriptorEnhanced/logs
4. Test Installation
# Test quote extraction
python3 test_quotes_simple.py
# Should output:
# β Quote extraction working
# β 39 quotes extracted from 2 transcripts
Production Configuration
Current Settings (Enterprise-Ready)
| Setting | Value | Purpose |
|---|---|---|
| LLM_BACKEND | hf_api |
HuggingFace Inference API |
| LLM_TIMEOUT | 60s |
Increased for longer generation |
| MAX_TOKENS_PER_REQUEST | 1500 |
Enterprise narrative length |
| Temperature (Analysis) | 0.5 |
Balanced creativity/accuracy |
| Temperature (Narrative) | 0.7 |
More creative storytelling |
| Max Tokens (LM Studio) | 2500 |
Full-length reports |
| Max Tokens (HF API) | 1500 |
API limits |
Model Selection
Current Models:
- Analysis:
microsoft/Phi-3-mini-4k-instruct(HF API) - Narrative:
mistralai/Mixtral-8x7B-Instruct-v0.1(HF API)
β οΈ Known Limitation: Phi-3-mini has only 4K context window. For transcripts >3000 words, consider:
- Switching to Mixtral-8x7B for analysis (8K context)
- Using LM Studio with larger local models
- Implementing better chunking strategy
Monitoring & Logging
Log Files Generated
Each analysis session creates:
Session Log:
logs/session_YYYYMMDD_HHMMSS.log- Detailed timestamped events
- All processing steps
- Warnings and errors
JSON Summary:
logs/summary_YYYYMMDD_HHMMSS.json- Structured metrics
- Machine-readable
- For integration with monitoring tools
Text Summary:
logs/summary_YYYYMMDD_HHMMSS.txt- Human-readable summary
- Success rates
- Error details
Metrics Tracked
Per Session:
- Transcripts processed / failed
- Success rate (%)
- Average processing time
- Quotes extracted
- Total session duration
- Error types and frequencies
Per Transcript:
- File name and type
- Quality score (0-1)
- Word count
- Processing time (seconds)
- Error details (if failed)
Example Log Output
2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Session started: 20251020_153045
2025-10-20 15:30:45 | INFO | TranscriptorAI_20251020_153045 | Processing started: HCP_Oncologist.txt | Type: HCP | Format: TXT
2025-10-20 15:31:12 | INFO | TranscriptorAI_20251020_153045 | Processing complete: HCP_Oncologist.txt | Quality: 0.95 | Words: 1847 | Time: 27.3s
2025-10-20 15:31:15 | INFO | TranscriptorAI_20251020_153045 | Quote extraction complete: 21 quotes | Top score: 1.00 | Themes: patient_management, prescribing, barriers, safety, diagnosis
2025-10-20 15:31:45 | INFO | TranscriptorAI_20251020_153045 | SESSION COMPLETE | Duration: 60.2s | Processed: 3 | Failed: 0 | Success Rate: 100.0%
Performance Benchmarks
Based on testing with sample data:
| Operation | Time | Notes |
|---|---|---|
| Single transcript processing | 25-35s | Depends on length |
| Quote extraction | 2-5s | Per transcript |
| Cross-transcript summary | 30-60s | For 3-10 transcripts |
| Total for 3 transcripts | ~2-3 minutes | End-to-end |
Bottlenecks:
- HuggingFace API latency (network dependent)
- LLM generation time (model dependent)
- Quote extraction (scales linearly)
Optimizations:
- Use LM Studio for faster local processing (if GPU available)
- Process transcripts in parallel (not yet implemented)
- Cache common analyses (not yet implemented)
Error Handling
Automatic Recovery
The system includes:
- Retry logic: 3 attempts with exponential backoff
- Fallback: HF API β LM Studio switching
- Graceful degradation: Continue processing other transcripts if one fails
- Emergency summaries: Generated if LLM fails
Common Errors & Solutions
Error: ModuleNotFoundError: No module named 'docx'
Solution: Install python-docx: pip3 install python-docx
Error: HF API timeout
Solution: Increase timeout in app.py line 25 or use LM Studio
Error: No quotes extracted
Solution: Check transcript formatting (needs speaker labels or quotation marks)
Error: Token limit exceeded
Solution: Already fixed - now using 1500-2500 tokens
Security Considerations
API Keys
- Store HuggingFace token in environment variables (NOT in code)
- Use secrets management for production (AWS Secrets Manager, HashiCorp Vault)
- Rotate tokens regularly
Data Privacy
- Transcript data is not sent to external services except HF API for LLM calls
- Logs contain file names but not transcript content
- Consider HIPAA compliance if processing patient interviews
- Implement data retention policies for logs
Access Control
- Restrict access to
/logsdirectory - Implement user authentication for Gradio UI (not currently included)
- Use HTTPS in production deployments
Scaling Recommendations
For 10-50 Transcripts/Day
Current setup is sufficient
- Single server deployment
- HuggingFace API with rate limiting
- Local log storage
For 50-200 Transcripts/Day
Recommended upgrades:
- Deploy with multiple workers (Gunicorn)
- Implement Redis queue for job management
- Use dedicated LM Studio instance on GPU server
- Centralized logging (ELK stack, Datadog)
For 200+ Transcripts/Day
Enterprise infrastructure:
- Kubernetes deployment with auto-scaling
- Separate microservices (extraction, analysis, reporting)
- Dedicated GPU cluster for LLM calls
- Cloud object storage (S3) for transcripts/reports
- Real-time monitoring dashboard
Deployment Checklist
Before Go-Live
- All dependencies installed (
pip3 install -r requirements.txt) - HuggingFace token configured
- Logs directory created with proper permissions
- Test with 3-5 real client transcripts
- Review generated reports for quality
- Verify quote extraction working (check console output)
- Set up log monitoring/alerts
- Document any client-specific customizations
Day 1 Production
- Start with 1-2 small client projects
- Monitor logs actively (
tail -f logs/session_*.log) - Verify session summaries being generated
- Track processing times vs. benchmarks
- Gather client feedback on report quality
Week 1 Production
- Review all session logs
- Calculate average success rate (target: >95%)
- Identify common errors
- Optimize based on bottlenecks
- Update documentation with learnings
Support & Maintenance
Daily Monitoring
Check these metrics daily:
- Success rate (should be >95%)
- Average processing time (should be <3 minutes for 3 transcripts)
- Error frequency (should be <5%)
- Quote extraction quality (top scores should be >0.75)
Weekly Maintenance
- Review session summary logs
- Clean up old logs (keep last 30 days)
- Update dependencies if security patches available
- Review client feedback
Monthly Review
- Analyze performance trends
- Plan optimization improvements
- Update models if better ones available
- Review and update documentation
Troubleshooting
Low Success Rate (<90%)
Possible Causes:
- HuggingFace API rate limiting
- Network connectivity issues
- Malformed transcript files
Actions:
- Check
logs/for error patterns - Verify HF token is valid
- Test with sample data
- Consider switching to LM Studio
Slow Processing (>5 minutes for 3 transcripts)
Possible Causes:
- Network latency to HF API
- Large transcript files
- Token limits causing retries
Actions:
- Check network latency:
ping api.huggingface.co - Review performance logs for bottlenecks
- Consider local LM Studio deployment
- Implement caching (future enhancement)
Poor Quote Quality (scores <0.50)
Possible Causes:
- Transcripts lack specific details
- No quotation marks or speaker labels
- Very technical/clinical language
Actions:
- Run
test_quotes_simple.pywith problematic transcript - Adjust scoring weights in
quote_extractor.py - Add custom patterns for your transcript format
- Accept that some transcripts naturally have fewer good quotes
Future Enhancements
High Priority (Next 3 Months):
- Upgrade to larger context model (Mixtral-8x7B for all operations)
- Parallel transcript processing
- User authentication for Gradio UI
- Real-time monitoring dashboard
Medium Priority (3-6 Months): 5. Caching layer for common analyses 6. Batch processing API 7. Client-specific customization templates 8. Enhanced error recovery
Low Priority (6-12 Months): 9. Multi-language support 10. Audio timestamp integration 11. Interactive HTML reports 12. A/B testing framework
Contact & Support
Documentation:
- Technical:
MARKET_RESEARCH_ENHANCEMENTS.md - User Guide:
STORYTELLING_QUICK_START.md - This Guide:
ENTERPRISE_DEPLOYMENT_GUIDE.md
Key Files:
- Logging:
production_logger.py - Main App:
app.py - Quote Extraction:
quote_extractor.py - Narrative Generation:
story_writer.py
Logs Location: /home/john/TranscriptorEnhanced/logs/
Summary
β Token Limits: Increased to 1500-2500 (enterprise-ready) β Logging: Full production monitoring implemented β Dependencies: Documented in requirements.txt
β οΈ Still Todo (requires production environment):
- Install python-docx (needs pip in environment)
- Test with 20+ real transcripts
- Set up centralized log monitoring
- Implement user authentication
Status: Ready for controlled production pilot with close monitoring
Last Updated: October 20, 2025 Version: 3.0-Enterprise