Chirapath's picture
Update README.md
f63f042 verified
---
title: Azure_Powered_AI_Summary
emoji: πŸ”₯
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# πŸŽ™οΈπŸ€– Azure-Powered AI Conference Service
> **Advanced AI-powered conference analysis with transcription, computer vision, and intelligent summarization using Azure AI Foundry**
A comprehensive solution that combines Azure Speech Services for transcription with Azure OpenAI for intelligent summarization, featuring computer vision analysis, multi-format document processing, and enterprise-grade security.
## 🌟 Key Features
### πŸŽ™οΈ **Advanced Transcription Services**
- **High-accuracy speech-to-text** using Azure Speech Services
- **Speaker diarization** with precise timestamp tracking (HH:MM:SS format)
- **Multi-language support** for 60+ languages and dialects
- **Real-time processing** with auto-refresh status updates
- **Enhanced audio processing** with FFmpeg integration
### πŸ€– **AI-Powered Summarization**
- **Intelligent conference analysis** using Azure OpenAI (GPT-4o models)
- **Multi-modal content processing** (transcripts, documents, images, videos)
- **Smart frame extraction** from presentation videos
- **Executive summaries** with action items and key insights
- **Multi-language output** support
### πŸ‘οΈ **Computer Vision Integration**
- **Automatic frame extraction** from videos using content-aware algorithms
- **OCR text extraction** from images and video frames
- **Slide change detection** for presentation content
- **Meeting scene analysis** for conference recordings
### πŸ“„ **Enhanced Document Processing**
- **Comprehensive format support**: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, CSV, TXT, JSON, RTF, ODT, ODS, ODP
- **Intelligent content extraction** with table and image handling
- **Batch processing** capabilities for multiple files
- **Error handling** and encoding detection
### πŸ” **Enterprise Security & GDPR Compliance**
- **User authentication** with secure password hashing
- **User-isolated storage** in Azure Blob containers
- **Complete data export** functionality for GDPR compliance
- **Account deletion** with full data removal
- **Audit logging** and comprehensive privacy controls
### 🎯 **User Experience**
- **Modern web interface** built with Gradio
- **Real-time status updates** with auto-refresh functionality
- **Comprehensive history** tracking for all services
- **Direct download** links for completed work
- **Mobile-responsive** design
## πŸ—οΈ Architecture Overview
```mermaid
graph TB
subgraph "Frontend"
A[Gradio Web Interface]
end
subgraph "Core Services"
B[Transcription Manager]
C[AI Summary Manager]
D[File Processor]
E[Video Frame Extractor]
end
subgraph "Azure Services"
F[Azure Speech Services]
G[Azure OpenAI]
H[Azure Computer Vision]
I[Azure Blob Storage]
end
subgraph "Data Layer"
J[SQLite Database]
K[User-Isolated Containers]
end
A --> B
A --> C
B --> F
B --> I
C --> G
C --> H
C --> D
C --> E
B --> J
C --> J
I --> K
```
## πŸš€ Quick Start
### Prerequisites
- **Python 3.8+** installed
- **FFmpeg** installed for audio/video processing
- **Azure subscription** with the following services:
- Azure Speech Services
- Azure OpenAI Service
- Azure Blob Storage
- Azure Computer Vision (optional but recommended)
### 1. Clone and Setup
```bash
# Clone the repository
git clone <repository-url>
cd azure-ai-conference-service
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
### 2. Configure Environment
```bash
# Copy environment template
cp env_template.sh .env
# Edit .env file with your Azure credentials
nano .env
```
**Required Configuration:**
- `AZURE_SPEECH_KEY` and `AZURE_SPEECH_KEY_ENDPOINT`
- `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_KEY`, and `AZURE_OPENAI_DEPLOYMENT`
- `AZURE_BLOB_CONNECTION`, `AZURE_CONTAINER`, and `AZURE_BLOB_SAS_TOKEN`
- `COMPUTER_VISION_ENDPOINT` and `COMPUTER_VISION_KEY` (optional)
### 3. Run the Application
```bash
# Start the service
python app.py
```
The service will be available at `http://localhost:7860`
## πŸ“ Project Structure
```
azure-ai-conference-service/
β”œβ”€β”€ app.py # Main Gradio application
β”œβ”€β”€ app_core.py # Core backend logic and database
β”œβ”€β”€ ai_summary.py # AI summarization manager
β”œβ”€β”€ file_processors.py # Document processing utilities
β”œβ”€β”€ image_extraction.py # Video frame extraction
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ env_template.sh # Environment configuration template
β”œβ”€β”€ .env # Your configuration (create from template)
β”œβ”€β”€ database/ # SQLite database files
β”œβ”€β”€ uploads/ # Temporary upload processing
β”œβ”€β”€ temp/ # Temporary files and downloads
└── logs/ # Application logs
```
## πŸ”§ Configuration Guide
### Azure Services Setup
#### 1. Azure Speech Services
```bash
# Create Speech resource
az cognitiveservices account create \
--name "your-speech-service" \
--resource-group "your-rg" \
--kind "SpeechServices" \
--sku "S0" \
--location "your-region"
```
#### 2. Azure OpenAI Service
```bash
# Create OpenAI resource
az cognitiveservices account create \
--name "your-openai-service" \
--resource-group "your-rg" \
--kind "OpenAI" \
--sku "S0" \
--location "your-region"
# Deploy model
az cognitiveservices account deployment create \
--name "your-openai-service" \
--resource-group "your-rg" \
--deployment-name "gpt-4o-mini" \
--model-name "gpt-4o-mini" \
--model-version "2024-07-18"
```
#### 3. Azure Blob Storage
```bash
# Create storage account
az storage account create \
--name "yourstorageaccount" \
--resource-group "your-rg" \
--location "your-region" \
--sku "Standard_LRS"
# Create containers
az storage container create --name "transcripts" --account-name "yourstorageaccount"
az storage container create --name "transcripts-summaries" --account-name "yourstorageaccount"
az storage container create --name "transcripts-chats" --account-name "yourstorageaccount"
```
### Environment Variables Reference
| Variable | Description | Required |
|----------|-------------|----------|
| `AZURE_SPEECH_KEY` | Azure Speech Services API key | βœ… |
| `AZURE_SPEECH_KEY_ENDPOINT` | Speech Services endpoint URL | βœ… |
| `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint URL | βœ… |
| `AZURE_OPENAI_KEY` | Azure OpenAI API key | βœ… |
| `AZURE_OPENAI_DEPLOYMENT` | Model deployment name | βœ… |
| `AZURE_BLOB_CONNECTION` | Blob storage connection string | βœ… |
| `AZURE_CONTAINER` | Main blob container name | βœ… |
| `AZURE_BLOB_SAS_TOKEN` | SAS token for blob access | βœ… |
| `COMPUTER_VISION_ENDPOINT` | Computer Vision endpoint | ⚠️ |
| `COMPUTER_VISION_KEY` | Computer Vision API key | ⚠️ |
**Legend:** βœ… Required | ⚠️ Recommended
## 🎯 Usage Examples
### Basic Transcription
1. **Register/Login** to the service
2. **Upload** an audio or video file
3. **Configure** language and speaker settings
4. **Start transcription** and wait for auto-refresh
5. **Download** the completed transcript
### AI-Powered Summary
1. **Choose content sources**: existing transcripts or new files
2. **Provide AI instructions**: specify format and focus areas
3. **Configure output**: language and format preferences
4. **Generate summary** with multi-modal analysis
5. **Download** comprehensive AI analysis
### Batch Processing
- Upload multiple files simultaneously
- Process presentations, documents, and videos together
- Generate unified summaries across all content types
## πŸ” Security Features
### Authentication & Authorization
- **Secure user registration** with password strength validation
- **Session management** with proper logout functionality
- **User isolation** - users can only access their own data
### Data Protection
- **User-separated blob storage** containers
- **Encrypted data transmission** over HTTPS
- **Audit logging** for all user actions
- **Automatic cleanup** of temporary files
### GDPR Compliance
- **Complete data export** in JSON format
- **Right to be forgotten** with full account deletion
- **Granular consent management** for different data uses
- **Data retention policies** with automatic cleanup
## πŸ“Š Performance Optimization
### Processing Efficiency
- **Background workers** for parallel processing
- **Smart frame extraction** using computer vision
- **Token optimization** for AI model efficiency
- **Caching strategies** for frequently accessed data
### Scalability
- **Horizontal scaling** support with load balancing
- **Resource limits** and rate limiting
- **Efficient database queries** with proper indexing
- **Auto-cleanup** of old data and temporary files
## πŸ› οΈ Development
### Local Development Setup
```bash
# Install development dependencies
pip install -r requirements.txt
# Set development mode
export DEV_MODE=True
# Run with auto-reload
python app.py --reload
```
### Testing
```bash
# Run basic tests
python -m pytest tests/
# Test Azure connections
python -c "from app_core import transcription_manager; print('βœ… Backend connected')"
python -c "from ai_summary import ai_summary_manager; print('βœ… AI service connected')"
```
### Adding New Features
1. **Backend Logic**: Add to `app_core.py` or create new modules
2. **AI Features**: Extend `ai_summary.py` with new capabilities
3. **File Processing**: Add new formats to `file_processors.py`
4. **UI Components**: Update `app.py` with new Gradio components
5. **Database**: Add migrations to database schema as needed
## πŸ“ˆ Monitoring & Troubleshooting
### Logging
- **Application logs**: Check `logs/ai_conference_service.log`
- **Error tracking**: Monitor console output for errors
- **Performance metrics**: Track processing times and success rates
### Common Issues
#### Connection Issues
```bash
# Test Azure Speech
curl -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \
"https://YOUR_REGION.api.cognitive.microsoft.com/sts/v1.0/issuetoken"
# Test Azure OpenAI
curl -H "api-key: YOUR_KEY" \
"https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_MODEL/chat/completions?api-version=2024-08-01-preview"
```
#### File Processing Issues
- Ensure **FFmpeg** is installed and in PATH
- Check file format support in `file_processors.py`
- Verify file size limits (default: 500MB)
#### Database Issues
- Check database permissions for `database/` directory
- Verify blob storage connection for database backups
- Monitor disk space for database growth
## 🚒 Production Deployment
### Docker Deployment
```dockerfile
FROM python:3.9-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
libsm6 \
libxext6 \
libxrender-dev \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
```
### Azure Container Instance
```bash
# Build and push image
docker build -t azure-ai-conference-service .
docker tag azure-ai-conference-service your-registry.azurecr.io/azure-ai-conference-service
docker push your-registry.azurecr.io/azure-ai-conference-service
# Deploy to Azure Container Instances
az container create \
--resource-group your-rg \
--name azure-ai-conference-service \
--image your-registry.azurecr.io/azure-ai-conference-service \
--ports 7860 \
--environment-variables \
AZURE_SPEECH_KEY=$AZURE_SPEECH_KEY \
AZURE_OPENAI_KEY=$AZURE_OPENAI_KEY \
# ... other environment variables
```
### Production Checklist
- [ ] **Security**: Change default passwords and salts
- [ ] **SSL/TLS**: Configure HTTPS certificates
- [ ] **Monitoring**: Set up Azure Application Insights
- [ ] **Backup**: Configure database and blob backup strategies
- [ ] **Scaling**: Configure auto-scaling policies
- [ ] **Compliance**: Review and configure GDPR settings
## πŸ“š API Reference
### Core Classes
#### `TranscriptionManager`
- `submit_transcription(file_bytes, filename, user_id, language, settings)`
- `get_job_status(job_id)`
- `get_user_history(user_id, limit)`
#### `AISummaryManager`
- `submit_summary_job(user_id, summary_type, user_prompt, files, settings)`
- `get_summary_status(job_id)`
- `get_user_summary_history(user_id, limit)`
#### `FileProcessor`
- `process_file(file_path, extension)`
- `batch_process_files(file_paths)`
- `get_file_info(file_path)`
## 🀝 Contributing
We welcome contributions! Please see our contributing guidelines:
1. **Fork** the repository
2. **Create** a feature branch
3. **Make** your changes with tests
4. **Submit** a pull request
### Development Standards
- **Code style**: Follow PEP 8 for Python code
- **Documentation**: Update README and docstrings
- **Testing**: Add tests for new features
- **Security**: Follow security best practices
## πŸ“„ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## πŸ†˜ Support
### Getting Help
- **Documentation**: Check this README and inline comments
- **Issues**: Create GitHub issues for bugs or feature requests
- **Azure Support**: Use Azure support for service-specific issues
### Contact Information
- **Project maintainer**: [Your contact information]
- **Technical support**: [Support email]
- **Azure resources**: [Azure documentation links]
---
## πŸŽ‰ Acknowledgments
- **Azure AI Services** for powerful AI capabilities
- **Gradio** for the excellent web interface framework
- **OpenCV** for computer vision functionality
- **Contributors** and the open-source community
---
**πŸš€ Ready to transform your conference analysis with AI? Get started today!**