--- title: Azure_Powered_AI_Summary emoji: 🔥 colorFrom: blue colorTo: red sdk: gradio sdk_version: 5.44.1 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # 🎙️🤖 Azure-Powered AI Conference Service > **Advanced AI-powered conference analysis with transcription, computer vision, and intelligent summarization using Azure AI Foundry** A comprehensive solution that combines Azure Speech Services for transcription with Azure OpenAI for intelligent summarization, featuring computer vision analysis, multi-format document processing, and enterprise-grade security. ## 🌟 Key Features ### 🎙️ **Advanced Transcription Services** - **High-accuracy speech-to-text** using Azure Speech Services - **Speaker diarization** with precise timestamp tracking (HH:MM:SS format) - **Multi-language support** for 60+ languages and dialects - **Real-time processing** with auto-refresh status updates - **Enhanced audio processing** with FFmpeg integration ### 🤖 **AI-Powered Summarization** - **Intelligent conference analysis** using Azure OpenAI (GPT-4o models) - **Multi-modal content processing** (transcripts, documents, images, videos) - **Smart frame extraction** from presentation videos - **Executive summaries** with action items and key insights - **Multi-language output** support ### 👁️ **Computer Vision Integration** - **Automatic frame extraction** from videos using content-aware algorithms - **OCR text extraction** from images and video frames - **Slide change detection** for presentation content - **Meeting scene analysis** for conference recordings ### 📄 **Enhanced Document Processing** - **Comprehensive format support**: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, CSV, TXT, JSON, RTF, ODT, ODS, ODP - **Intelligent content extraction** with table and image handling - **Batch processing** capabilities for multiple files - **Error handling** and encoding detection ### 🔐 **Enterprise Security & GDPR Compliance** - **User authentication** with secure password hashing - **User-isolated storage** in Azure Blob containers - **Complete data export** functionality for GDPR compliance - **Account deletion** with full data removal - **Audit logging** and comprehensive privacy controls ### 🎯 **User Experience** - **Modern web interface** built with Gradio - **Real-time status updates** with auto-refresh functionality - **Comprehensive history** tracking for all services - **Direct download** links for completed work - **Mobile-responsive** design ## 🏗️ Architecture Overview ```mermaid graph TB subgraph "Frontend" A[Gradio Web Interface] end subgraph "Core Services" B[Transcription Manager] C[AI Summary Manager] D[File Processor] E[Video Frame Extractor] end subgraph "Azure Services" F[Azure Speech Services] G[Azure OpenAI] H[Azure Computer Vision] I[Azure Blob Storage] end subgraph "Data Layer" J[SQLite Database] K[User-Isolated Containers] end A --> B A --> C B --> F B --> I C --> G C --> H C --> D C --> E B --> J C --> J I --> K ``` ## 🚀 Quick Start ### Prerequisites - **Python 3.8+** installed - **FFmpeg** installed for audio/video processing - **Azure subscription** with the following services: - Azure Speech Services - Azure OpenAI Service - Azure Blob Storage - Azure Computer Vision (optional but recommended) ### 1. Clone and Setup ```bash # Clone the repository git clone cd azure-ai-conference-service # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt ``` ### 2. Configure Environment ```bash # Copy environment template cp env_template.sh .env # Edit .env file with your Azure credentials nano .env ``` **Required Configuration:** - `AZURE_SPEECH_KEY` and `AZURE_SPEECH_KEY_ENDPOINT` - `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_KEY`, and `AZURE_OPENAI_DEPLOYMENT` - `AZURE_BLOB_CONNECTION`, `AZURE_CONTAINER`, and `AZURE_BLOB_SAS_TOKEN` - `COMPUTER_VISION_ENDPOINT` and `COMPUTER_VISION_KEY` (optional) ### 3. Run the Application ```bash # Start the service python app.py ``` The service will be available at `http://localhost:7860` ## 📁 Project Structure ``` azure-ai-conference-service/ ├── app.py # Main Gradio application ├── app_core.py # Core backend logic and database ├── ai_summary.py # AI summarization manager ├── file_processors.py # Document processing utilities ├── image_extraction.py # Video frame extraction ├── requirements.txt # Python dependencies ├── env_template.sh # Environment configuration template ├── .env # Your configuration (create from template) ├── database/ # SQLite database files ├── uploads/ # Temporary upload processing ├── temp/ # Temporary files and downloads └── logs/ # Application logs ``` ## 🔧 Configuration Guide ### Azure Services Setup #### 1. Azure Speech Services ```bash # Create Speech resource az cognitiveservices account create \ --name "your-speech-service" \ --resource-group "your-rg" \ --kind "SpeechServices" \ --sku "S0" \ --location "your-region" ``` #### 2. Azure OpenAI Service ```bash # Create OpenAI resource az cognitiveservices account create \ --name "your-openai-service" \ --resource-group "your-rg" \ --kind "OpenAI" \ --sku "S0" \ --location "your-region" # Deploy model az cognitiveservices account deployment create \ --name "your-openai-service" \ --resource-group "your-rg" \ --deployment-name "gpt-4o-mini" \ --model-name "gpt-4o-mini" \ --model-version "2024-07-18" ``` #### 3. Azure Blob Storage ```bash # Create storage account az storage account create \ --name "yourstorageaccount" \ --resource-group "your-rg" \ --location "your-region" \ --sku "Standard_LRS" # Create containers az storage container create --name "transcripts" --account-name "yourstorageaccount" az storage container create --name "transcripts-summaries" --account-name "yourstorageaccount" az storage container create --name "transcripts-chats" --account-name "yourstorageaccount" ``` ### Environment Variables Reference | Variable | Description | Required | |----------|-------------|----------| | `AZURE_SPEECH_KEY` | Azure Speech Services API key | ✅ | | `AZURE_SPEECH_KEY_ENDPOINT` | Speech Services endpoint URL | ✅ | | `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint URL | ✅ | | `AZURE_OPENAI_KEY` | Azure OpenAI API key | ✅ | | `AZURE_OPENAI_DEPLOYMENT` | Model deployment name | ✅ | | `AZURE_BLOB_CONNECTION` | Blob storage connection string | ✅ | | `AZURE_CONTAINER` | Main blob container name | ✅ | | `AZURE_BLOB_SAS_TOKEN` | SAS token for blob access | ✅ | | `COMPUTER_VISION_ENDPOINT` | Computer Vision endpoint | ⚠️ | | `COMPUTER_VISION_KEY` | Computer Vision API key | ⚠️ | **Legend:** ✅ Required | ⚠️ Recommended ## 🎯 Usage Examples ### Basic Transcription 1. **Register/Login** to the service 2. **Upload** an audio or video file 3. **Configure** language and speaker settings 4. **Start transcription** and wait for auto-refresh 5. **Download** the completed transcript ### AI-Powered Summary 1. **Choose content sources**: existing transcripts or new files 2. **Provide AI instructions**: specify format and focus areas 3. **Configure output**: language and format preferences 4. **Generate summary** with multi-modal analysis 5. **Download** comprehensive AI analysis ### Batch Processing - Upload multiple files simultaneously - Process presentations, documents, and videos together - Generate unified summaries across all content types ## 🔐 Security Features ### Authentication & Authorization - **Secure user registration** with password strength validation - **Session management** with proper logout functionality - **User isolation** - users can only access their own data ### Data Protection - **User-separated blob storage** containers - **Encrypted data transmission** over HTTPS - **Audit logging** for all user actions - **Automatic cleanup** of temporary files ### GDPR Compliance - **Complete data export** in JSON format - **Right to be forgotten** with full account deletion - **Granular consent management** for different data uses - **Data retention policies** with automatic cleanup ## 📊 Performance Optimization ### Processing Efficiency - **Background workers** for parallel processing - **Smart frame extraction** using computer vision - **Token optimization** for AI model efficiency - **Caching strategies** for frequently accessed data ### Scalability - **Horizontal scaling** support with load balancing - **Resource limits** and rate limiting - **Efficient database queries** with proper indexing - **Auto-cleanup** of old data and temporary files ## 🛠️ Development ### Local Development Setup ```bash # Install development dependencies pip install -r requirements.txt # Set development mode export DEV_MODE=True # Run with auto-reload python app.py --reload ``` ### Testing ```bash # Run basic tests python -m pytest tests/ # Test Azure connections python -c "from app_core import transcription_manager; print('✅ Backend connected')" python -c "from ai_summary import ai_summary_manager; print('✅ AI service connected')" ``` ### Adding New Features 1. **Backend Logic**: Add to `app_core.py` or create new modules 2. **AI Features**: Extend `ai_summary.py` with new capabilities 3. **File Processing**: Add new formats to `file_processors.py` 4. **UI Components**: Update `app.py` with new Gradio components 5. **Database**: Add migrations to database schema as needed ## 📈 Monitoring & Troubleshooting ### Logging - **Application logs**: Check `logs/ai_conference_service.log` - **Error tracking**: Monitor console output for errors - **Performance metrics**: Track processing times and success rates ### Common Issues #### Connection Issues ```bash # Test Azure Speech curl -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \ "https://YOUR_REGION.api.cognitive.microsoft.com/sts/v1.0/issuetoken" # Test Azure OpenAI curl -H "api-key: YOUR_KEY" \ "https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_MODEL/chat/completions?api-version=2024-08-01-preview" ``` #### File Processing Issues - Ensure **FFmpeg** is installed and in PATH - Check file format support in `file_processors.py` - Verify file size limits (default: 500MB) #### Database Issues - Check database permissions for `database/` directory - Verify blob storage connection for database backups - Monitor disk space for database growth ## 🚢 Production Deployment ### Docker Deployment ```dockerfile FROM python:3.9-slim WORKDIR /app # Install system dependencies RUN apt-get update && apt-get install -y \ ffmpeg \ libsm6 \ libxext6 \ libxrender-dev \ libglib2.0-0 \ && rm -rf /var/lib/apt/lists/* COPY requirements.txt . RUN pip install -r requirements.txt COPY . . EXPOSE 7860 CMD ["python", "app.py"] ``` ### Azure Container Instance ```bash # Build and push image docker build -t azure-ai-conference-service . docker tag azure-ai-conference-service your-registry.azurecr.io/azure-ai-conference-service docker push your-registry.azurecr.io/azure-ai-conference-service # Deploy to Azure Container Instances az container create \ --resource-group your-rg \ --name azure-ai-conference-service \ --image your-registry.azurecr.io/azure-ai-conference-service \ --ports 7860 \ --environment-variables \ AZURE_SPEECH_KEY=$AZURE_SPEECH_KEY \ AZURE_OPENAI_KEY=$AZURE_OPENAI_KEY \ # ... other environment variables ``` ### Production Checklist - [ ] **Security**: Change default passwords and salts - [ ] **SSL/TLS**: Configure HTTPS certificates - [ ] **Monitoring**: Set up Azure Application Insights - [ ] **Backup**: Configure database and blob backup strategies - [ ] **Scaling**: Configure auto-scaling policies - [ ] **Compliance**: Review and configure GDPR settings ## 📚 API Reference ### Core Classes #### `TranscriptionManager` - `submit_transcription(file_bytes, filename, user_id, language, settings)` - `get_job_status(job_id)` - `get_user_history(user_id, limit)` #### `AISummaryManager` - `submit_summary_job(user_id, summary_type, user_prompt, files, settings)` - `get_summary_status(job_id)` - `get_user_summary_history(user_id, limit)` #### `FileProcessor` - `process_file(file_path, extension)` - `batch_process_files(file_paths)` - `get_file_info(file_path)` ## 🤝 Contributing We welcome contributions! Please see our contributing guidelines: 1. **Fork** the repository 2. **Create** a feature branch 3. **Make** your changes with tests 4. **Submit** a pull request ### Development Standards - **Code style**: Follow PEP 8 for Python code - **Documentation**: Update README and docstrings - **Testing**: Add tests for new features - **Security**: Follow security best practices ## 📄 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## 🆘 Support ### Getting Help - **Documentation**: Check this README and inline comments - **Issues**: Create GitHub issues for bugs or feature requests - **Azure Support**: Use Azure support for service-specific issues ### Contact Information - **Project maintainer**: [Your contact information] - **Technical support**: [Support email] - **Azure resources**: [Azure documentation links] --- ## 🎉 Acknowledgments - **Azure AI Services** for powerful AI capabilities - **Gradio** for the excellent web interface framework - **OpenCV** for computer vision functionality - **Contributors** and the open-source community --- **🚀 Ready to transform your conference analysis with AI? Get started today!**