A newer version of the Gradio SDK is available:
6.3.0
title: Azure_Powered_AI_Summary
emoji: π₯
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
ποΈπ€ Azure-Powered AI Conference Service
Advanced AI-powered conference analysis with transcription, computer vision, and intelligent summarization using Azure AI Foundry
A comprehensive solution that combines Azure Speech Services for transcription with Azure OpenAI for intelligent summarization, featuring computer vision analysis, multi-format document processing, and enterprise-grade security.
π Key Features
ποΈ Advanced Transcription Services
- High-accuracy speech-to-text using Azure Speech Services
- Speaker diarization with precise timestamp tracking (HH:MM:SS format)
- Multi-language support for 60+ languages and dialects
- Real-time processing with auto-refresh status updates
- Enhanced audio processing with FFmpeg integration
π€ AI-Powered Summarization
- Intelligent conference analysis using Azure OpenAI (GPT-4o models)
- Multi-modal content processing (transcripts, documents, images, videos)
- Smart frame extraction from presentation videos
- Executive summaries with action items and key insights
- Multi-language output support
ποΈ Computer Vision Integration
- Automatic frame extraction from videos using content-aware algorithms
- OCR text extraction from images and video frames
- Slide change detection for presentation content
- Meeting scene analysis for conference recordings
π Enhanced Document Processing
- Comprehensive format support: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, CSV, TXT, JSON, RTF, ODT, ODS, ODP
- Intelligent content extraction with table and image handling
- Batch processing capabilities for multiple files
- Error handling and encoding detection
π Enterprise Security & GDPR Compliance
- User authentication with secure password hashing
- User-isolated storage in Azure Blob containers
- Complete data export functionality for GDPR compliance
- Account deletion with full data removal
- Audit logging and comprehensive privacy controls
π― User Experience
- Modern web interface built with Gradio
- Real-time status updates with auto-refresh functionality
- Comprehensive history tracking for all services
- Direct download links for completed work
- Mobile-responsive design
ποΈ Architecture Overview
graph TB
subgraph "Frontend"
A[Gradio Web Interface]
end
subgraph "Core Services"
B[Transcription Manager]
C[AI Summary Manager]
D[File Processor]
E[Video Frame Extractor]
end
subgraph "Azure Services"
F[Azure Speech Services]
G[Azure OpenAI]
H[Azure Computer Vision]
I[Azure Blob Storage]
end
subgraph "Data Layer"
J[SQLite Database]
K[User-Isolated Containers]
end
A --> B
A --> C
B --> F
B --> I
C --> G
C --> H
C --> D
C --> E
B --> J
C --> J
I --> K
π Quick Start
Prerequisites
- Python 3.8+ installed
- FFmpeg installed for audio/video processing
- Azure subscription with the following services:
- Azure Speech Services
- Azure OpenAI Service
- Azure Blob Storage
- Azure Computer Vision (optional but recommended)
1. Clone and Setup
# Clone the repository
git clone <repository-url>
cd azure-ai-conference-service
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
2. Configure Environment
# Copy environment template
cp env_template.sh .env
# Edit .env file with your Azure credentials
nano .env
Required Configuration:
AZURE_SPEECH_KEYandAZURE_SPEECH_KEY_ENDPOINTAZURE_OPENAI_ENDPOINT,AZURE_OPENAI_KEY, andAZURE_OPENAI_DEPLOYMENTAZURE_BLOB_CONNECTION,AZURE_CONTAINER, andAZURE_BLOB_SAS_TOKENCOMPUTER_VISION_ENDPOINTandCOMPUTER_VISION_KEY(optional)
3. Run the Application
# Start the service
python app.py
The service will be available at http://localhost:7860
π Project Structure
azure-ai-conference-service/
βββ app.py # Main Gradio application
βββ app_core.py # Core backend logic and database
βββ ai_summary.py # AI summarization manager
βββ file_processors.py # Document processing utilities
βββ image_extraction.py # Video frame extraction
βββ requirements.txt # Python dependencies
βββ env_template.sh # Environment configuration template
βββ .env # Your configuration (create from template)
βββ database/ # SQLite database files
βββ uploads/ # Temporary upload processing
βββ temp/ # Temporary files and downloads
βββ logs/ # Application logs
π§ Configuration Guide
Azure Services Setup
1. Azure Speech Services
# Create Speech resource
az cognitiveservices account create \
--name "your-speech-service" \
--resource-group "your-rg" \
--kind "SpeechServices" \
--sku "S0" \
--location "your-region"
2. Azure OpenAI Service
# Create OpenAI resource
az cognitiveservices account create \
--name "your-openai-service" \
--resource-group "your-rg" \
--kind "OpenAI" \
--sku "S0" \
--location "your-region"
# Deploy model
az cognitiveservices account deployment create \
--name "your-openai-service" \
--resource-group "your-rg" \
--deployment-name "gpt-4o-mini" \
--model-name "gpt-4o-mini" \
--model-version "2024-07-18"
3. Azure Blob Storage
# Create storage account
az storage account create \
--name "yourstorageaccount" \
--resource-group "your-rg" \
--location "your-region" \
--sku "Standard_LRS"
# Create containers
az storage container create --name "transcripts" --account-name "yourstorageaccount"
az storage container create --name "transcripts-summaries" --account-name "yourstorageaccount"
az storage container create --name "transcripts-chats" --account-name "yourstorageaccount"
Environment Variables Reference
| Variable | Description | Required |
|---|---|---|
AZURE_SPEECH_KEY |
Azure Speech Services API key | β |
AZURE_SPEECH_KEY_ENDPOINT |
Speech Services endpoint URL | β |
AZURE_OPENAI_ENDPOINT |
Azure OpenAI endpoint URL | β |
AZURE_OPENAI_KEY |
Azure OpenAI API key | β |
AZURE_OPENAI_DEPLOYMENT |
Model deployment name | β |
AZURE_BLOB_CONNECTION |
Blob storage connection string | β |
AZURE_CONTAINER |
Main blob container name | β |
AZURE_BLOB_SAS_TOKEN |
SAS token for blob access | β |
COMPUTER_VISION_ENDPOINT |
Computer Vision endpoint | β οΈ |
COMPUTER_VISION_KEY |
Computer Vision API key | β οΈ |
Legend: β Required | β οΈ Recommended
π― Usage Examples
Basic Transcription
- Register/Login to the service
- Upload an audio or video file
- Configure language and speaker settings
- Start transcription and wait for auto-refresh
- Download the completed transcript
AI-Powered Summary
- Choose content sources: existing transcripts or new files
- Provide AI instructions: specify format and focus areas
- Configure output: language and format preferences
- Generate summary with multi-modal analysis
- Download comprehensive AI analysis
Batch Processing
- Upload multiple files simultaneously
- Process presentations, documents, and videos together
- Generate unified summaries across all content types
π Security Features
Authentication & Authorization
- Secure user registration with password strength validation
- Session management with proper logout functionality
- User isolation - users can only access their own data
Data Protection
- User-separated blob storage containers
- Encrypted data transmission over HTTPS
- Audit logging for all user actions
- Automatic cleanup of temporary files
GDPR Compliance
- Complete data export in JSON format
- Right to be forgotten with full account deletion
- Granular consent management for different data uses
- Data retention policies with automatic cleanup
π Performance Optimization
Processing Efficiency
- Background workers for parallel processing
- Smart frame extraction using computer vision
- Token optimization for AI model efficiency
- Caching strategies for frequently accessed data
Scalability
- Horizontal scaling support with load balancing
- Resource limits and rate limiting
- Efficient database queries with proper indexing
- Auto-cleanup of old data and temporary files
π οΈ Development
Local Development Setup
# Install development dependencies
pip install -r requirements.txt
# Set development mode
export DEV_MODE=True
# Run with auto-reload
python app.py --reload
Testing
# Run basic tests
python -m pytest tests/
# Test Azure connections
python -c "from app_core import transcription_manager; print('β
Backend connected')"
python -c "from ai_summary import ai_summary_manager; print('β
AI service connected')"
Adding New Features
- Backend Logic: Add to
app_core.pyor create new modules - AI Features: Extend
ai_summary.pywith new capabilities - File Processing: Add new formats to
file_processors.py - UI Components: Update
app.pywith new Gradio components - Database: Add migrations to database schema as needed
π Monitoring & Troubleshooting
Logging
- Application logs: Check
logs/ai_conference_service.log - Error tracking: Monitor console output for errors
- Performance metrics: Track processing times and success rates
Common Issues
Connection Issues
# Test Azure Speech
curl -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \
"https://YOUR_REGION.api.cognitive.microsoft.com/sts/v1.0/issuetoken"
# Test Azure OpenAI
curl -H "api-key: YOUR_KEY" \
"https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_MODEL/chat/completions?api-version=2024-08-01-preview"
File Processing Issues
- Ensure FFmpeg is installed and in PATH
- Check file format support in
file_processors.py - Verify file size limits (default: 500MB)
Database Issues
- Check database permissions for
database/directory - Verify blob storage connection for database backups
- Monitor disk space for database growth
π’ Production Deployment
Docker Deployment
FROM python:3.9-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
libsm6 \
libxext6 \
libxrender-dev \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
Azure Container Instance
# Build and push image
docker build -t azure-ai-conference-service .
docker tag azure-ai-conference-service your-registry.azurecr.io/azure-ai-conference-service
docker push your-registry.azurecr.io/azure-ai-conference-service
# Deploy to Azure Container Instances
az container create \
--resource-group your-rg \
--name azure-ai-conference-service \
--image your-registry.azurecr.io/azure-ai-conference-service \
--ports 7860 \
--environment-variables \
AZURE_SPEECH_KEY=$AZURE_SPEECH_KEY \
AZURE_OPENAI_KEY=$AZURE_OPENAI_KEY \
# ... other environment variables
Production Checklist
- Security: Change default passwords and salts
- SSL/TLS: Configure HTTPS certificates
- Monitoring: Set up Azure Application Insights
- Backup: Configure database and blob backup strategies
- Scaling: Configure auto-scaling policies
- Compliance: Review and configure GDPR settings
π API Reference
Core Classes
TranscriptionManager
submit_transcription(file_bytes, filename, user_id, language, settings)get_job_status(job_id)get_user_history(user_id, limit)
AISummaryManager
submit_summary_job(user_id, summary_type, user_prompt, files, settings)get_summary_status(job_id)get_user_summary_history(user_id, limit)
FileProcessor
process_file(file_path, extension)batch_process_files(file_paths)get_file_info(file_path)
π€ Contributing
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
Development Standards
- Code style: Follow PEP 8 for Python code
- Documentation: Update README and docstrings
- Testing: Add tests for new features
- Security: Follow security best practices
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Support
Getting Help
- Documentation: Check this README and inline comments
- Issues: Create GitHub issues for bugs or feature requests
- Azure Support: Use Azure support for service-specific issues
Contact Information
- Project maintainer: [Your contact information]
- Technical support: [Support email]
- Azure resources: [Azure documentation links]
π Acknowledgments
- Azure AI Services for powerful AI capabilities
- Gradio for the excellent web interface framework
- OpenCV for computer vision functionality
- Contributors and the open-source community
π Ready to transform your conference analysis with AI? Get started today!