Spaces:

empirenexus
/

TranscriptWriting

Sleeping

App Files Files Community

TranscriptWriting / ENTERPRISE_DEPLOYMENT_GUIDE.md

jmisak

Upload 57 files

52d0298 verified 2 months ago

preview code

raw

history blame contribute delete

11.5 kB

	# Enterprise Deployment Guide

	TranscriptorAI v3.0 - Market Research Edition
	Updated: October 20, 2025

	---

	## Pre-Deployment Checklist

	### Required Changes (Completed ✅)

	- [x] Token Limits Increased
	- From: 100 tokens → To: 1500-2500 tokens
	- Files: `app.py`, `llm.py`, `story_writer.py`
	- Impact: Enables comprehensive market research narratives

	- [x] Production Logging Implemented
	- New file: `production_logger.py`
	- Integrated into: `app.py`
	- Features: Session tracking, performance metrics, error logging, export to JSON/TXT

	- [x] Dependencies Documented
	- File: `requirements.txt`
	- Key requirement: `python-docx>=1.0.0` for DOCX support

	### Installation Steps

	#### 1. Install Dependencies

	```bash
	cd /home/john/TranscriptorEnhanced

	# Install all required packages
	pip3 install -r requirements.txt

	# Or install individually:
	pip3 install gradio>=4.0.0
	pip3 install huggingface_hub>=0.19.0
	pip3 install python-docx>=1.0.0
	pip3 install pdfplumber>=0.10.0
	pip3 install pandas>=2.0.0
	pip3 install matplotlib>=3.7.0
	pip3 install reportlab>=4.0.0
	pip3 install tiktoken>=0.5.0
	pip3 install nltk>=3.8.0
	pip3 install scikit-learn>=1.3.0
	```

	#### 2. Set Environment Variables

	Required:
	```bash
	export HUGGINGFACE_TOKEN="your_hf_token_here"
	```

	Optional (for LM Studio):
	```bash
	export USE_LMSTUDIO=True
	export LM_STUDIO_URL="http://localhost:1234"
	```

	#### 3. Create Logs Directory

	```bash
	mkdir -p /home/john/TranscriptorEnhanced/logs
	chmod 755 /home/john/TranscriptorEnhanced/logs
	```

	#### 4. Test Installation

	```bash
	# Test quote extraction
	python3 test_quotes_simple.py

	# Should output:
	# ✓ Quote extraction working
	# ✓ 39 quotes extracted from 2 transcripts
	```

	---

	## Production Configuration

	### Current Settings (Enterprise-Ready)

	\| Setting \| Value \| Purpose \|
	\|---------\|-------\|---------\|
	\| LLM_BACKEND \| `hf_api` \| HuggingFace Inference API \|
	\| LLM_TIMEOUT \| `60s` \| Increased for longer generation \|
	\| MAX_TOKENS_PER_REQUEST \| `1500` \| Enterprise narrative length \|
	\| Temperature (Analysis) \| `0.5` \| Balanced creativity/accuracy \|
	\| Temperature (Narrative) \| `0.7` \| More creative storytelling \|
	\| Max Tokens (LM Studio) \| `2500` \| Full-length reports \|
	\| Max Tokens (HF API) \| `1500` \| API limits \|

	### Model Selection

	Current Models:
	- Analysis: `microsoft/Phi-3-mini-4k-instruct` (HF API)
	- Narrative: `mistralai/Mixtral-8x7B-Instruct-v0.1` (HF API)

	⚠️ Known Limitation: Phi-3-mini has only 4K context window. For transcripts >3000 words, consider:
	- Switching to Mixtral-8x7B for analysis (8K context)
	- Using LM Studio with larger local models
	- Implementing better chunking strategy

	---

	## Monitoring & Logging

	### Log Files Generated

	Each analysis session creates:

	1. Session Log: `logs/session_YYYYMMDD_HHMMSS.log`
	- Detailed timestamped events
	- All processing steps
	- Warnings and errors

	2. JSON Summary: `logs/summary_YYYYMMDD_HHMMSS.json`
	- Structured metrics
	- Machine-readable
	- For integration with monitoring tools

	3. Text Summary: `logs/summary_YYYYMMDD_HHMMSS.txt`
	- Human-readable summary
	- Success rates
	- Error details

	### Metrics Tracked

	Per Session:
	- Transcripts processed / failed
	- Success rate (%)
	- Average processing time
	- Quotes extracted
	- Total session duration
	- Error types and frequencies

	Per Transcript:
	- File name and type
	- Quality score (0-1)
	- Word count
	- Processing time (seconds)
	- Error details (if failed)

	### Example Log Output

	```
	2025-10-20 15:30:45 \| INFO \| TranscriptorAI_20251020_153045 \| Session started: 20251020_153045
	2025-10-20 15:30:45 \| INFO \| TranscriptorAI_20251020_153045 \| Processing started: HCP_Oncologist.txt \| Type: HCP \| Format: TXT
	2025-10-20 15:31:12 \| INFO \| TranscriptorAI_20251020_153045 \| Processing complete: HCP_Oncologist.txt \| Quality: 0.95 \| Words: 1847 \| Time: 27.3s
	2025-10-20 15:31:15 \| INFO \| TranscriptorAI_20251020_153045 \| Quote extraction complete: 21 quotes \| Top score: 1.00 \| Themes: patient_management, prescribing, barriers, safety, diagnosis
	2025-10-20 15:31:45 \| INFO \| TranscriptorAI_20251020_153045 \| SESSION COMPLETE \| Duration: 60.2s \| Processed: 3 \| Failed: 0 \| Success Rate: 100.0%
	```

	---

	## Performance Benchmarks

	Based on testing with sample data:

	\| Operation \| Time \| Notes \|
	\|-----------\|------\|-------\|
	\| Single transcript processing \| 25-35s \| Depends on length \|
	\| Quote extraction \| 2-5s \| Per transcript \|
	\| Cross-transcript summary \| 30-60s \| For 3-10 transcripts \|
	\| Total for 3 transcripts \| ~2-3 minutes \| End-to-end \|

	Bottlenecks:
	1. HuggingFace API latency (network dependent)
	2. LLM generation time (model dependent)
	3. Quote extraction (scales linearly)

	Optimizations:
	- Use LM Studio for faster local processing (if GPU available)
	- Process transcripts in parallel (not yet implemented)
	- Cache common analyses (not yet implemented)

	---

	## Error Handling

	### Automatic Recovery

	The system includes:
	- Retry logic: 3 attempts with exponential backoff
	- Fallback: HF API ↔ LM Studio switching
	- Graceful degradation: Continue processing other transcripts if one fails
	- Emergency summaries: Generated if LLM fails

	### Common Errors & Solutions

	Error: `ModuleNotFoundError: No module named 'docx'`
	Solution: Install python-docx: `pip3 install python-docx`

	Error: `HF API timeout`
	Solution: Increase timeout in `app.py` line 25 or use LM Studio

	Error: `No quotes extracted`
	Solution: Check transcript formatting (needs speaker labels or quotation marks)

	Error: `Token limit exceeded`
	Solution: Already fixed - now using 1500-2500 tokens

	---

	## Security Considerations

	### API Keys

	- Store HuggingFace token in environment variables (NOT in code)
	- Use secrets management for production (AWS Secrets Manager, HashiCorp Vault)
	- Rotate tokens regularly

	### Data Privacy

	- Transcript data is not sent to external services except HF API for LLM calls
	- Logs contain file names but not transcript content
	- Consider HIPAA compliance if processing patient interviews
	- Implement data retention policies for logs

	### Access Control

	- Restrict access to `/logs` directory
	- Implement user authentication for Gradio UI (not currently included)
	- Use HTTPS in production deployments

	---

	## Scaling Recommendations

	### For 10-50 Transcripts/Day

	Current setup is sufficient
	- Single server deployment
	- HuggingFace API with rate limiting
	- Local log storage

	### For 50-200 Transcripts/Day

	Recommended upgrades:
	- Deploy with multiple workers (Gunicorn)
	- Implement Redis queue for job management
	- Use dedicated LM Studio instance on GPU server
	- Centralized logging (ELK stack, Datadog)

	### For 200+ Transcripts/Day

	Enterprise infrastructure:
	- Kubernetes deployment with auto-scaling
	- Separate microservices (extraction, analysis, reporting)
	- Dedicated GPU cluster for LLM calls
	- Cloud object storage (S3) for transcripts/reports
	- Real-time monitoring dashboard

	---

	## Deployment Checklist

	### Before Go-Live

	- [ ] All dependencies installed (`pip3 install -r requirements.txt`)
	- [ ] HuggingFace token configured
	- [ ] Logs directory created with proper permissions
	- [ ] Test with 3-5 real client transcripts
	- [ ] Review generated reports for quality
	- [ ] Verify quote extraction working (check console output)
	- [ ] Set up log monitoring/alerts
	- [ ] Document any client-specific customizations

	### Day 1 Production

	- [ ] Start with 1-2 small client projects
	- [ ] Monitor logs actively (`tail -f logs/session_*.log`)
	- [ ] Verify session summaries being generated
	- [ ] Track processing times vs. benchmarks
	- [ ] Gather client feedback on report quality

	### Week 1 Production

	- [ ] Review all session logs
	- [ ] Calculate average success rate (target: >95%)
	- [ ] Identify common errors
	- [ ] Optimize based on bottlenecks
	- [ ] Update documentation with learnings

	---

	## Support & Maintenance

	### Daily Monitoring

	Check these metrics daily:
	- Success rate (should be >95%)
	- Average processing time (should be <3 minutes for 3 transcripts)
	- Error frequency (should be <5%)
	- Quote extraction quality (top scores should be >0.75)

	### Weekly Maintenance

	- Review session summary logs
	- Clean up old logs (keep last 30 days)
	- Update dependencies if security patches available
	- Review client feedback

	### Monthly Review

	- Analyze performance trends
	- Plan optimization improvements
	- Update models if better ones available
	- Review and update documentation

	---

	## Troubleshooting

	### Low Success Rate (<90%)

	Possible Causes:
	- HuggingFace API rate limiting
	- Network connectivity issues
	- Malformed transcript files

	Actions:
	1. Check `logs/` for error patterns
	2. Verify HF token is valid
	3. Test with sample data
	4. Consider switching to LM Studio

	### Slow Processing (>5 minutes for 3 transcripts)

	Possible Causes:
	- Network latency to HF API
	- Large transcript files
	- Token limits causing retries

	Actions:
	1. Check network latency: `ping api.huggingface.co`
	2. Review performance logs for bottlenecks
	3. Consider local LM Studio deployment
	4. Implement caching (future enhancement)

	### Poor Quote Quality (scores <0.50)

	Possible Causes:
	- Transcripts lack specific details
	- No quotation marks or speaker labels
	- Very technical/clinical language

	Actions:
	1. Run `test_quotes_simple.py` with problematic transcript
	2. Adjust scoring weights in `quote_extractor.py`
	3. Add custom patterns for your transcript format
	4. Accept that some transcripts naturally have fewer good quotes

	---

	## Future Enhancements

	High Priority (Next 3 Months):
	1. Upgrade to larger context model (Mixtral-8x7B for all operations)
	2. Parallel transcript processing
	3. User authentication for Gradio UI
	4. Real-time monitoring dashboard

	Medium Priority (3-6 Months):
	5. Caching layer for common analyses
	6. Batch processing API
	7. Client-specific customization templates
	8. Enhanced error recovery

	Low Priority (6-12 Months):
	9. Multi-language support
	10. Audio timestamp integration
	11. Interactive HTML reports
	12. A/B testing framework

	---

	## Contact & Support

	Documentation:
	- Technical: `MARKET_RESEARCH_ENHANCEMENTS.md`
	- User Guide: `STORYTELLING_QUICK_START.md`
	- This Guide: `ENTERPRISE_DEPLOYMENT_GUIDE.md`

	Key Files:
	- Logging: `production_logger.py`
	- Main App: `app.py`
	- Quote Extraction: `quote_extractor.py`
	- Narrative Generation: `story_writer.py`

	Logs Location: `/home/john/TranscriptorEnhanced/logs/`

	---

	## Summary

	✅ Token Limits: Increased to 1500-2500 (enterprise-ready)
	✅ Logging: Full production monitoring implemented
	✅ Dependencies: Documented in requirements.txt

	⚠️ Still Todo (requires production environment):
	- Install python-docx (needs pip in environment)
	- Test with 20+ real transcripts
	- Set up centralized log monitoring
	- Implement user authentication

	Status: Ready for controlled production pilot with close monitoring

	---

	Last Updated: October 20, 2025
	Version: 3.0-Enterprise