|
|
--- |
|
|
title: Azure_Powered_AI_Summary |
|
|
emoji: π₯ |
|
|
colorFrom: blue |
|
|
colorTo: red |
|
|
sdk: gradio |
|
|
sdk_version: 5.44.1 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
|
|
|
|
# ποΈπ€ Azure-Powered AI Conference Service |
|
|
|
|
|
> **Advanced AI-powered conference analysis with transcription, computer vision, and intelligent summarization using Azure AI Foundry** |
|
|
|
|
|
A comprehensive solution that combines Azure Speech Services for transcription with Azure OpenAI for intelligent summarization, featuring computer vision analysis, multi-format document processing, and enterprise-grade security. |
|
|
|
|
|
## π Key Features |
|
|
|
|
|
### ποΈ **Advanced Transcription Services** |
|
|
- **High-accuracy speech-to-text** using Azure Speech Services |
|
|
- **Speaker diarization** with precise timestamp tracking (HH:MM:SS format) |
|
|
- **Multi-language support** for 60+ languages and dialects |
|
|
- **Real-time processing** with auto-refresh status updates |
|
|
- **Enhanced audio processing** with FFmpeg integration |
|
|
|
|
|
### π€ **AI-Powered Summarization** |
|
|
- **Intelligent conference analysis** using Azure OpenAI (GPT-4o models) |
|
|
- **Multi-modal content processing** (transcripts, documents, images, videos) |
|
|
- **Smart frame extraction** from presentation videos |
|
|
- **Executive summaries** with action items and key insights |
|
|
- **Multi-language output** support |
|
|
|
|
|
### ποΈ **Computer Vision Integration** |
|
|
- **Automatic frame extraction** from videos using content-aware algorithms |
|
|
- **OCR text extraction** from images and video frames |
|
|
- **Slide change detection** for presentation content |
|
|
- **Meeting scene analysis** for conference recordings |
|
|
|
|
|
### π **Enhanced Document Processing** |
|
|
- **Comprehensive format support**: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, CSV, TXT, JSON, RTF, ODT, ODS, ODP |
|
|
- **Intelligent content extraction** with table and image handling |
|
|
- **Batch processing** capabilities for multiple files |
|
|
- **Error handling** and encoding detection |
|
|
|
|
|
### π **Enterprise Security & GDPR Compliance** |
|
|
- **User authentication** with secure password hashing |
|
|
- **User-isolated storage** in Azure Blob containers |
|
|
- **Complete data export** functionality for GDPR compliance |
|
|
- **Account deletion** with full data removal |
|
|
- **Audit logging** and comprehensive privacy controls |
|
|
|
|
|
### π― **User Experience** |
|
|
- **Modern web interface** built with Gradio |
|
|
- **Real-time status updates** with auto-refresh functionality |
|
|
- **Comprehensive history** tracking for all services |
|
|
- **Direct download** links for completed work |
|
|
- **Mobile-responsive** design |
|
|
|
|
|
## ποΈ Architecture Overview |
|
|
|
|
|
```mermaid |
|
|
graph TB |
|
|
subgraph "Frontend" |
|
|
A[Gradio Web Interface] |
|
|
end |
|
|
|
|
|
subgraph "Core Services" |
|
|
B[Transcription Manager] |
|
|
C[AI Summary Manager] |
|
|
D[File Processor] |
|
|
E[Video Frame Extractor] |
|
|
end |
|
|
|
|
|
subgraph "Azure Services" |
|
|
F[Azure Speech Services] |
|
|
G[Azure OpenAI] |
|
|
H[Azure Computer Vision] |
|
|
I[Azure Blob Storage] |
|
|
end |
|
|
|
|
|
subgraph "Data Layer" |
|
|
J[SQLite Database] |
|
|
K[User-Isolated Containers] |
|
|
end |
|
|
|
|
|
A --> B |
|
|
A --> C |
|
|
B --> F |
|
|
B --> I |
|
|
C --> G |
|
|
C --> H |
|
|
C --> D |
|
|
C --> E |
|
|
B --> J |
|
|
C --> J |
|
|
I --> K |
|
|
``` |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Prerequisites |
|
|
|
|
|
- **Python 3.8+** installed |
|
|
- **FFmpeg** installed for audio/video processing |
|
|
- **Azure subscription** with the following services: |
|
|
- Azure Speech Services |
|
|
- Azure OpenAI Service |
|
|
- Azure Blob Storage |
|
|
- Azure Computer Vision (optional but recommended) |
|
|
|
|
|
### 1. Clone and Setup |
|
|
|
|
|
```bash |
|
|
# Clone the repository |
|
|
git clone <repository-url> |
|
|
cd azure-ai-conference-service |
|
|
|
|
|
# Create virtual environment |
|
|
python -m venv venv |
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate |
|
|
|
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
### 2. Configure Environment |
|
|
|
|
|
```bash |
|
|
# Copy environment template |
|
|
cp env_template.sh .env |
|
|
|
|
|
# Edit .env file with your Azure credentials |
|
|
nano .env |
|
|
``` |
|
|
|
|
|
**Required Configuration:** |
|
|
- `AZURE_SPEECH_KEY` and `AZURE_SPEECH_KEY_ENDPOINT` |
|
|
- `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_KEY`, and `AZURE_OPENAI_DEPLOYMENT` |
|
|
- `AZURE_BLOB_CONNECTION`, `AZURE_CONTAINER`, and `AZURE_BLOB_SAS_TOKEN` |
|
|
- `COMPUTER_VISION_ENDPOINT` and `COMPUTER_VISION_KEY` (optional) |
|
|
|
|
|
### 3. Run the Application |
|
|
|
|
|
```bash |
|
|
# Start the service |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
The service will be available at `http://localhost:7860` |
|
|
|
|
|
## π Project Structure |
|
|
|
|
|
``` |
|
|
azure-ai-conference-service/ |
|
|
βββ app.py # Main Gradio application |
|
|
βββ app_core.py # Core backend logic and database |
|
|
βββ ai_summary.py # AI summarization manager |
|
|
βββ file_processors.py # Document processing utilities |
|
|
βββ image_extraction.py # Video frame extraction |
|
|
βββ requirements.txt # Python dependencies |
|
|
βββ env_template.sh # Environment configuration template |
|
|
βββ .env # Your configuration (create from template) |
|
|
βββ database/ # SQLite database files |
|
|
βββ uploads/ # Temporary upload processing |
|
|
βββ temp/ # Temporary files and downloads |
|
|
βββ logs/ # Application logs |
|
|
``` |
|
|
|
|
|
## π§ Configuration Guide |
|
|
|
|
|
### Azure Services Setup |
|
|
|
|
|
#### 1. Azure Speech Services |
|
|
```bash |
|
|
# Create Speech resource |
|
|
az cognitiveservices account create \ |
|
|
--name "your-speech-service" \ |
|
|
--resource-group "your-rg" \ |
|
|
--kind "SpeechServices" \ |
|
|
--sku "S0" \ |
|
|
--location "your-region" |
|
|
``` |
|
|
|
|
|
#### 2. Azure OpenAI Service |
|
|
```bash |
|
|
# Create OpenAI resource |
|
|
az cognitiveservices account create \ |
|
|
--name "your-openai-service" \ |
|
|
--resource-group "your-rg" \ |
|
|
--kind "OpenAI" \ |
|
|
--sku "S0" \ |
|
|
--location "your-region" |
|
|
|
|
|
# Deploy model |
|
|
az cognitiveservices account deployment create \ |
|
|
--name "your-openai-service" \ |
|
|
--resource-group "your-rg" \ |
|
|
--deployment-name "gpt-4o-mini" \ |
|
|
--model-name "gpt-4o-mini" \ |
|
|
--model-version "2024-07-18" |
|
|
``` |
|
|
|
|
|
#### 3. Azure Blob Storage |
|
|
```bash |
|
|
# Create storage account |
|
|
az storage account create \ |
|
|
--name "yourstorageaccount" \ |
|
|
--resource-group "your-rg" \ |
|
|
--location "your-region" \ |
|
|
--sku "Standard_LRS" |
|
|
|
|
|
# Create containers |
|
|
az storage container create --name "transcripts" --account-name "yourstorageaccount" |
|
|
az storage container create --name "transcripts-summaries" --account-name "yourstorageaccount" |
|
|
az storage container create --name "transcripts-chats" --account-name "yourstorageaccount" |
|
|
``` |
|
|
|
|
|
### Environment Variables Reference |
|
|
|
|
|
| Variable | Description | Required | |
|
|
|----------|-------------|----------| |
|
|
| `AZURE_SPEECH_KEY` | Azure Speech Services API key | β
| |
|
|
| `AZURE_SPEECH_KEY_ENDPOINT` | Speech Services endpoint URL | β
| |
|
|
| `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint URL | β
| |
|
|
| `AZURE_OPENAI_KEY` | Azure OpenAI API key | β
| |
|
|
| `AZURE_OPENAI_DEPLOYMENT` | Model deployment name | β
| |
|
|
| `AZURE_BLOB_CONNECTION` | Blob storage connection string | β
| |
|
|
| `AZURE_CONTAINER` | Main blob container name | β
| |
|
|
| `AZURE_BLOB_SAS_TOKEN` | SAS token for blob access | β
| |
|
|
| `COMPUTER_VISION_ENDPOINT` | Computer Vision endpoint | β οΈ | |
|
|
| `COMPUTER_VISION_KEY` | Computer Vision API key | β οΈ | |
|
|
|
|
|
**Legend:** β
Required | β οΈ Recommended |
|
|
|
|
|
## π― Usage Examples |
|
|
|
|
|
### Basic Transcription |
|
|
1. **Register/Login** to the service |
|
|
2. **Upload** an audio or video file |
|
|
3. **Configure** language and speaker settings |
|
|
4. **Start transcription** and wait for auto-refresh |
|
|
5. **Download** the completed transcript |
|
|
|
|
|
### AI-Powered Summary |
|
|
1. **Choose content sources**: existing transcripts or new files |
|
|
2. **Provide AI instructions**: specify format and focus areas |
|
|
3. **Configure output**: language and format preferences |
|
|
4. **Generate summary** with multi-modal analysis |
|
|
5. **Download** comprehensive AI analysis |
|
|
|
|
|
### Batch Processing |
|
|
- Upload multiple files simultaneously |
|
|
- Process presentations, documents, and videos together |
|
|
- Generate unified summaries across all content types |
|
|
|
|
|
## π Security Features |
|
|
|
|
|
### Authentication & Authorization |
|
|
- **Secure user registration** with password strength validation |
|
|
- **Session management** with proper logout functionality |
|
|
- **User isolation** - users can only access their own data |
|
|
|
|
|
### Data Protection |
|
|
- **User-separated blob storage** containers |
|
|
- **Encrypted data transmission** over HTTPS |
|
|
- **Audit logging** for all user actions |
|
|
- **Automatic cleanup** of temporary files |
|
|
|
|
|
### GDPR Compliance |
|
|
- **Complete data export** in JSON format |
|
|
- **Right to be forgotten** with full account deletion |
|
|
- **Granular consent management** for different data uses |
|
|
- **Data retention policies** with automatic cleanup |
|
|
|
|
|
## π Performance Optimization |
|
|
|
|
|
### Processing Efficiency |
|
|
- **Background workers** for parallel processing |
|
|
- **Smart frame extraction** using computer vision |
|
|
- **Token optimization** for AI model efficiency |
|
|
- **Caching strategies** for frequently accessed data |
|
|
|
|
|
### Scalability |
|
|
- **Horizontal scaling** support with load balancing |
|
|
- **Resource limits** and rate limiting |
|
|
- **Efficient database queries** with proper indexing |
|
|
- **Auto-cleanup** of old data and temporary files |
|
|
|
|
|
## π οΈ Development |
|
|
|
|
|
### Local Development Setup |
|
|
|
|
|
```bash |
|
|
# Install development dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Set development mode |
|
|
export DEV_MODE=True |
|
|
|
|
|
# Run with auto-reload |
|
|
python app.py --reload |
|
|
``` |
|
|
|
|
|
### Testing |
|
|
|
|
|
```bash |
|
|
# Run basic tests |
|
|
python -m pytest tests/ |
|
|
|
|
|
# Test Azure connections |
|
|
python -c "from app_core import transcription_manager; print('β
Backend connected')" |
|
|
python -c "from ai_summary import ai_summary_manager; print('β
AI service connected')" |
|
|
``` |
|
|
|
|
|
### Adding New Features |
|
|
|
|
|
1. **Backend Logic**: Add to `app_core.py` or create new modules |
|
|
2. **AI Features**: Extend `ai_summary.py` with new capabilities |
|
|
3. **File Processing**: Add new formats to `file_processors.py` |
|
|
4. **UI Components**: Update `app.py` with new Gradio components |
|
|
5. **Database**: Add migrations to database schema as needed |
|
|
|
|
|
## π Monitoring & Troubleshooting |
|
|
|
|
|
### Logging |
|
|
- **Application logs**: Check `logs/ai_conference_service.log` |
|
|
- **Error tracking**: Monitor console output for errors |
|
|
- **Performance metrics**: Track processing times and success rates |
|
|
|
|
|
### Common Issues |
|
|
|
|
|
#### Connection Issues |
|
|
```bash |
|
|
# Test Azure Speech |
|
|
curl -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \ |
|
|
"https://YOUR_REGION.api.cognitive.microsoft.com/sts/v1.0/issuetoken" |
|
|
|
|
|
# Test Azure OpenAI |
|
|
curl -H "api-key: YOUR_KEY" \ |
|
|
"https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_MODEL/chat/completions?api-version=2024-08-01-preview" |
|
|
``` |
|
|
|
|
|
#### File Processing Issues |
|
|
- Ensure **FFmpeg** is installed and in PATH |
|
|
- Check file format support in `file_processors.py` |
|
|
- Verify file size limits (default: 500MB) |
|
|
|
|
|
#### Database Issues |
|
|
- Check database permissions for `database/` directory |
|
|
- Verify blob storage connection for database backups |
|
|
- Monitor disk space for database growth |
|
|
|
|
|
## π’ Production Deployment |
|
|
|
|
|
### Docker Deployment |
|
|
|
|
|
```dockerfile |
|
|
FROM python:3.9-slim |
|
|
|
|
|
WORKDIR /app |
|
|
|
|
|
# Install system dependencies |
|
|
RUN apt-get update && apt-get install -y \ |
|
|
ffmpeg \ |
|
|
libsm6 \ |
|
|
libxext6 \ |
|
|
libxrender-dev \ |
|
|
libglib2.0-0 \ |
|
|
&& rm -rf /var/lib/apt/lists/* |
|
|
|
|
|
COPY requirements.txt . |
|
|
RUN pip install -r requirements.txt |
|
|
|
|
|
COPY . . |
|
|
|
|
|
EXPOSE 7860 |
|
|
|
|
|
CMD ["python", "app.py"] |
|
|
``` |
|
|
|
|
|
### Azure Container Instance |
|
|
|
|
|
```bash |
|
|
# Build and push image |
|
|
docker build -t azure-ai-conference-service . |
|
|
docker tag azure-ai-conference-service your-registry.azurecr.io/azure-ai-conference-service |
|
|
docker push your-registry.azurecr.io/azure-ai-conference-service |
|
|
|
|
|
# Deploy to Azure Container Instances |
|
|
az container create \ |
|
|
--resource-group your-rg \ |
|
|
--name azure-ai-conference-service \ |
|
|
--image your-registry.azurecr.io/azure-ai-conference-service \ |
|
|
--ports 7860 \ |
|
|
--environment-variables \ |
|
|
AZURE_SPEECH_KEY=$AZURE_SPEECH_KEY \ |
|
|
AZURE_OPENAI_KEY=$AZURE_OPENAI_KEY \ |
|
|
# ... other environment variables |
|
|
``` |
|
|
|
|
|
### Production Checklist |
|
|
|
|
|
- [ ] **Security**: Change default passwords and salts |
|
|
- [ ] **SSL/TLS**: Configure HTTPS certificates |
|
|
- [ ] **Monitoring**: Set up Azure Application Insights |
|
|
- [ ] **Backup**: Configure database and blob backup strategies |
|
|
- [ ] **Scaling**: Configure auto-scaling policies |
|
|
- [ ] **Compliance**: Review and configure GDPR settings |
|
|
|
|
|
## π API Reference |
|
|
|
|
|
### Core Classes |
|
|
|
|
|
#### `TranscriptionManager` |
|
|
- `submit_transcription(file_bytes, filename, user_id, language, settings)` |
|
|
- `get_job_status(job_id)` |
|
|
- `get_user_history(user_id, limit)` |
|
|
|
|
|
#### `AISummaryManager` |
|
|
- `submit_summary_job(user_id, summary_type, user_prompt, files, settings)` |
|
|
- `get_summary_status(job_id)` |
|
|
- `get_user_summary_history(user_id, limit)` |
|
|
|
|
|
#### `FileProcessor` |
|
|
- `process_file(file_path, extension)` |
|
|
- `batch_process_files(file_paths)` |
|
|
- `get_file_info(file_path)` |
|
|
|
|
|
## π€ Contributing |
|
|
|
|
|
We welcome contributions! Please see our contributing guidelines: |
|
|
|
|
|
1. **Fork** the repository |
|
|
2. **Create** a feature branch |
|
|
3. **Make** your changes with tests |
|
|
4. **Submit** a pull request |
|
|
|
|
|
### Development Standards |
|
|
- **Code style**: Follow PEP 8 for Python code |
|
|
- **Documentation**: Update README and docstrings |
|
|
- **Testing**: Add tests for new features |
|
|
- **Security**: Follow security best practices |
|
|
|
|
|
## π License |
|
|
|
|
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
|
|
|
|
|
## π Support |
|
|
|
|
|
### Getting Help |
|
|
- **Documentation**: Check this README and inline comments |
|
|
- **Issues**: Create GitHub issues for bugs or feature requests |
|
|
- **Azure Support**: Use Azure support for service-specific issues |
|
|
|
|
|
### Contact Information |
|
|
- **Project maintainer**: [Your contact information] |
|
|
- **Technical support**: [Support email] |
|
|
- **Azure resources**: [Azure documentation links] |
|
|
|
|
|
--- |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
- **Azure AI Services** for powerful AI capabilities |
|
|
- **Gradio** for the excellent web interface framework |
|
|
- **OpenCV** for computer vision functionality |
|
|
- **Contributors** and the open-source community |
|
|
|
|
|
--- |
|
|
|
|
|
**π Ready to transform your conference analysis with AI? Get started today!** |