Chirapath's picture
Update README.md
f63f042 verified

A newer version of the Gradio SDK is available: 6.3.0

Upgrade
metadata
title: Azure_Powered_AI_Summary
emoji: πŸ”₯
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

πŸŽ™οΈπŸ€– Azure-Powered AI Conference Service

Advanced AI-powered conference analysis with transcription, computer vision, and intelligent summarization using Azure AI Foundry

A comprehensive solution that combines Azure Speech Services for transcription with Azure OpenAI for intelligent summarization, featuring computer vision analysis, multi-format document processing, and enterprise-grade security.

🌟 Key Features

πŸŽ™οΈ Advanced Transcription Services

  • High-accuracy speech-to-text using Azure Speech Services
  • Speaker diarization with precise timestamp tracking (HH:MM:SS format)
  • Multi-language support for 60+ languages and dialects
  • Real-time processing with auto-refresh status updates
  • Enhanced audio processing with FFmpeg integration

πŸ€– AI-Powered Summarization

  • Intelligent conference analysis using Azure OpenAI (GPT-4o models)
  • Multi-modal content processing (transcripts, documents, images, videos)
  • Smart frame extraction from presentation videos
  • Executive summaries with action items and key insights
  • Multi-language output support

πŸ‘οΈ Computer Vision Integration

  • Automatic frame extraction from videos using content-aware algorithms
  • OCR text extraction from images and video frames
  • Slide change detection for presentation content
  • Meeting scene analysis for conference recordings

πŸ“„ Enhanced Document Processing

  • Comprehensive format support: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, CSV, TXT, JSON, RTF, ODT, ODS, ODP
  • Intelligent content extraction with table and image handling
  • Batch processing capabilities for multiple files
  • Error handling and encoding detection

πŸ” Enterprise Security & GDPR Compliance

  • User authentication with secure password hashing
  • User-isolated storage in Azure Blob containers
  • Complete data export functionality for GDPR compliance
  • Account deletion with full data removal
  • Audit logging and comprehensive privacy controls

🎯 User Experience

  • Modern web interface built with Gradio
  • Real-time status updates with auto-refresh functionality
  • Comprehensive history tracking for all services
  • Direct download links for completed work
  • Mobile-responsive design

πŸ—οΈ Architecture Overview

graph TB
    subgraph "Frontend"
        A[Gradio Web Interface]
    end
    
    subgraph "Core Services"
        B[Transcription Manager]
        C[AI Summary Manager] 
        D[File Processor]
        E[Video Frame Extractor]
    end
    
    subgraph "Azure Services"
        F[Azure Speech Services]
        G[Azure OpenAI]
        H[Azure Computer Vision]
        I[Azure Blob Storage]
    end
    
    subgraph "Data Layer"
        J[SQLite Database]
        K[User-Isolated Containers]
    end
    
    A --> B
    A --> C
    B --> F
    B --> I
    C --> G
    C --> H
    C --> D
    C --> E
    B --> J
    C --> J
    I --> K

πŸš€ Quick Start

Prerequisites

  • Python 3.8+ installed
  • FFmpeg installed for audio/video processing
  • Azure subscription with the following services:
    • Azure Speech Services
    • Azure OpenAI Service
    • Azure Blob Storage
    • Azure Computer Vision (optional but recommended)

1. Clone and Setup

# Clone the repository
git clone <repository-url>
cd azure-ai-conference-service

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Configure Environment

# Copy environment template
cp env_template.sh .env

# Edit .env file with your Azure credentials
nano .env

Required Configuration:

  • AZURE_SPEECH_KEY and AZURE_SPEECH_KEY_ENDPOINT
  • AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_KEY, and AZURE_OPENAI_DEPLOYMENT
  • AZURE_BLOB_CONNECTION, AZURE_CONTAINER, and AZURE_BLOB_SAS_TOKEN
  • COMPUTER_VISION_ENDPOINT and COMPUTER_VISION_KEY (optional)

3. Run the Application

# Start the service
python app.py

The service will be available at http://localhost:7860

πŸ“ Project Structure

azure-ai-conference-service/
β”œβ”€β”€ app.py                  # Main Gradio application
β”œβ”€β”€ app_core.py            # Core backend logic and database
β”œβ”€β”€ ai_summary.py          # AI summarization manager
β”œβ”€β”€ file_processors.py     # Document processing utilities
β”œβ”€β”€ image_extraction.py    # Video frame extraction
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ env_template.sh        # Environment configuration template
β”œβ”€β”€ .env                   # Your configuration (create from template)
β”œβ”€β”€ database/              # SQLite database files
β”œβ”€β”€ uploads/              # Temporary upload processing
β”œβ”€β”€ temp/                 # Temporary files and downloads
└── logs/                 # Application logs

πŸ”§ Configuration Guide

Azure Services Setup

1. Azure Speech Services

# Create Speech resource
az cognitiveservices account create \
  --name "your-speech-service" \
  --resource-group "your-rg" \
  --kind "SpeechServices" \
  --sku "S0" \
  --location "your-region"

2. Azure OpenAI Service

# Create OpenAI resource
az cognitiveservices account create \
  --name "your-openai-service" \
  --resource-group "your-rg" \
  --kind "OpenAI" \
  --sku "S0" \
  --location "your-region"

# Deploy model
az cognitiveservices account deployment create \
  --name "your-openai-service" \
  --resource-group "your-rg" \
  --deployment-name "gpt-4o-mini" \
  --model-name "gpt-4o-mini" \
  --model-version "2024-07-18"

3. Azure Blob Storage

# Create storage account
az storage account create \
  --name "yourstorageaccount" \
  --resource-group "your-rg" \
  --location "your-region" \
  --sku "Standard_LRS"

# Create containers
az storage container create --name "transcripts" --account-name "yourstorageaccount"
az storage container create --name "transcripts-summaries" --account-name "yourstorageaccount"
az storage container create --name "transcripts-chats" --account-name "yourstorageaccount"

Environment Variables Reference

Variable Description Required
AZURE_SPEECH_KEY Azure Speech Services API key βœ…
AZURE_SPEECH_KEY_ENDPOINT Speech Services endpoint URL βœ…
AZURE_OPENAI_ENDPOINT Azure OpenAI endpoint URL βœ…
AZURE_OPENAI_KEY Azure OpenAI API key βœ…
AZURE_OPENAI_DEPLOYMENT Model deployment name βœ…
AZURE_BLOB_CONNECTION Blob storage connection string βœ…
AZURE_CONTAINER Main blob container name βœ…
AZURE_BLOB_SAS_TOKEN SAS token for blob access βœ…
COMPUTER_VISION_ENDPOINT Computer Vision endpoint ⚠️
COMPUTER_VISION_KEY Computer Vision API key ⚠️

Legend: βœ… Required | ⚠️ Recommended

🎯 Usage Examples

Basic Transcription

  1. Register/Login to the service
  2. Upload an audio or video file
  3. Configure language and speaker settings
  4. Start transcription and wait for auto-refresh
  5. Download the completed transcript

AI-Powered Summary

  1. Choose content sources: existing transcripts or new files
  2. Provide AI instructions: specify format and focus areas
  3. Configure output: language and format preferences
  4. Generate summary with multi-modal analysis
  5. Download comprehensive AI analysis

Batch Processing

  • Upload multiple files simultaneously
  • Process presentations, documents, and videos together
  • Generate unified summaries across all content types

πŸ” Security Features

Authentication & Authorization

  • Secure user registration with password strength validation
  • Session management with proper logout functionality
  • User isolation - users can only access their own data

Data Protection

  • User-separated blob storage containers
  • Encrypted data transmission over HTTPS
  • Audit logging for all user actions
  • Automatic cleanup of temporary files

GDPR Compliance

  • Complete data export in JSON format
  • Right to be forgotten with full account deletion
  • Granular consent management for different data uses
  • Data retention policies with automatic cleanup

πŸ“Š Performance Optimization

Processing Efficiency

  • Background workers for parallel processing
  • Smart frame extraction using computer vision
  • Token optimization for AI model efficiency
  • Caching strategies for frequently accessed data

Scalability

  • Horizontal scaling support with load balancing
  • Resource limits and rate limiting
  • Efficient database queries with proper indexing
  • Auto-cleanup of old data and temporary files

πŸ› οΈ Development

Local Development Setup

# Install development dependencies
pip install -r requirements.txt

# Set development mode
export DEV_MODE=True

# Run with auto-reload
python app.py --reload

Testing

# Run basic tests
python -m pytest tests/

# Test Azure connections
python -c "from app_core import transcription_manager; print('βœ… Backend connected')"
python -c "from ai_summary import ai_summary_manager; print('βœ… AI service connected')"

Adding New Features

  1. Backend Logic: Add to app_core.py or create new modules
  2. AI Features: Extend ai_summary.py with new capabilities
  3. File Processing: Add new formats to file_processors.py
  4. UI Components: Update app.py with new Gradio components
  5. Database: Add migrations to database schema as needed

πŸ“ˆ Monitoring & Troubleshooting

Logging

  • Application logs: Check logs/ai_conference_service.log
  • Error tracking: Monitor console output for errors
  • Performance metrics: Track processing times and success rates

Common Issues

Connection Issues

# Test Azure Speech
curl -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \
     "https://YOUR_REGION.api.cognitive.microsoft.com/sts/v1.0/issuetoken"

# Test Azure OpenAI
curl -H "api-key: YOUR_KEY" \
     "https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_MODEL/chat/completions?api-version=2024-08-01-preview"

File Processing Issues

  • Ensure FFmpeg is installed and in PATH
  • Check file format support in file_processors.py
  • Verify file size limits (default: 500MB)

Database Issues

  • Check database permissions for database/ directory
  • Verify blob storage connection for database backups
  • Monitor disk space for database growth

🚒 Production Deployment

Docker Deployment

FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    ffmpeg \
    libsm6 \
    libxext6 \
    libxrender-dev \
    libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 7860

CMD ["python", "app.py"]

Azure Container Instance

# Build and push image
docker build -t azure-ai-conference-service .
docker tag azure-ai-conference-service your-registry.azurecr.io/azure-ai-conference-service
docker push your-registry.azurecr.io/azure-ai-conference-service

# Deploy to Azure Container Instances
az container create \
  --resource-group your-rg \
  --name azure-ai-conference-service \
  --image your-registry.azurecr.io/azure-ai-conference-service \
  --ports 7860 \
  --environment-variables \
    AZURE_SPEECH_KEY=$AZURE_SPEECH_KEY \
    AZURE_OPENAI_KEY=$AZURE_OPENAI_KEY \
    # ... other environment variables

Production Checklist

  • Security: Change default passwords and salts
  • SSL/TLS: Configure HTTPS certificates
  • Monitoring: Set up Azure Application Insights
  • Backup: Configure database and blob backup strategies
  • Scaling: Configure auto-scaling policies
  • Compliance: Review and configure GDPR settings

πŸ“š API Reference

Core Classes

TranscriptionManager

  • submit_transcription(file_bytes, filename, user_id, language, settings)
  • get_job_status(job_id)
  • get_user_history(user_id, limit)

AISummaryManager

  • submit_summary_job(user_id, summary_type, user_prompt, files, settings)
  • get_summary_status(job_id)
  • get_user_summary_history(user_id, limit)

FileProcessor

  • process_file(file_path, extension)
  • batch_process_files(file_paths)
  • get_file_info(file_path)

🀝 Contributing

We welcome contributions! Please see our contributing guidelines:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Submit a pull request

Development Standards

  • Code style: Follow PEP 8 for Python code
  • Documentation: Update README and docstrings
  • Testing: Add tests for new features
  • Security: Follow security best practices

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

Getting Help

  • Documentation: Check this README and inline comments
  • Issues: Create GitHub issues for bugs or feature requests
  • Azure Support: Use Azure support for service-specific issues

Contact Information

  • Project maintainer: [Your contact information]
  • Technical support: [Support email]
  • Azure resources: [Azure documentation links]

πŸŽ‰ Acknowledgments

  • Azure AI Services for powerful AI capabilities
  • Gradio for the excellent web interface framework
  • OpenCV for computer vision functionality
  • Contributors and the open-source community

πŸš€ Ready to transform your conference analysis with AI? Get started today!