Spaces:

Betimes-Solution
/

Azure_Powered_AI_Summary

Sleeping

App Files Files Community

Azure_Powered_AI_Summary / README.md

Chirapath

Update README.md

f63f042 verified 4 months ago

preview code

raw

history blame contribute delete

14.1 kB

A newer version of the Gradio SDK is available: 6.3.0

Upgrade

metadata

title: Azure_Powered_AI_Summary
emoji: 🔥
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

🎙️🤖 Azure-Powered AI Conference Service

Advanced AI-powered conference analysis with transcription, computer vision, and intelligent summarization using Azure AI Foundry

A comprehensive solution that combines Azure Speech Services for transcription with Azure OpenAI for intelligent summarization, featuring computer vision analysis, multi-format document processing, and enterprise-grade security.

🌟 Key Features

🎙️ Advanced Transcription Services

High-accuracy speech-to-text using Azure Speech Services
Speaker diarization with precise timestamp tracking (HH:MM:SS format)
Multi-language support for 60+ languages and dialects
Real-time processing with auto-refresh status updates
Enhanced audio processing with FFmpeg integration

🤖 AI-Powered Summarization

Intelligent conference analysis using Azure OpenAI (GPT-4o models)
Multi-modal content processing (transcripts, documents, images, videos)
Smart frame extraction from presentation videos
Executive summaries with action items and key insights
Multi-language output support

👁️ Computer Vision Integration

Automatic frame extraction from videos using content-aware algorithms
OCR text extraction from images and video frames
Slide change detection for presentation content
Meeting scene analysis for conference recordings

📄 Enhanced Document Processing

Comprehensive format support: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, CSV, TXT, JSON, RTF, ODT, ODS, ODP
Intelligent content extraction with table and image handling
Batch processing capabilities for multiple files
Error handling and encoding detection

🔐 Enterprise Security & GDPR Compliance

User authentication with secure password hashing
User-isolated storage in Azure Blob containers
Complete data export functionality for GDPR compliance
Account deletion with full data removal
Audit logging and comprehensive privacy controls

🎯 User Experience

Modern web interface built with Gradio
Real-time status updates with auto-refresh functionality
Comprehensive history tracking for all services
Direct download links for completed work
Mobile-responsive design

🏗️ Architecture Overview

graph TB
    subgraph "Frontend"
        A[Gradio Web Interface]
    end
    
    subgraph "Core Services"
        B[Transcription Manager]
        C[AI Summary Manager] 
        D[File Processor]
        E[Video Frame Extractor]
    end
    
    subgraph "Azure Services"
        F[Azure Speech Services]
        G[Azure OpenAI]
        H[Azure Computer Vision]
        I[Azure Blob Storage]
    end
    
    subgraph "Data Layer"
        J[SQLite Database]
        K[User-Isolated Containers]
    end
    
    A --> B
    A --> C
    B --> F
    B --> I
    C --> G
    C --> H
    C --> D
    C --> E
    B --> J
    C --> J
    I --> K

🚀 Quick Start

Prerequisites

Python 3.8+ installed
FFmpeg installed for audio/video processing
Azure subscription with the following services:
- Azure Speech Services
- Azure OpenAI Service
- Azure Blob Storage
- Azure Computer Vision (optional but recommended)

1. Clone and Setup

# Clone the repository
git clone <repository-url>
cd azure-ai-conference-service

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Configure Environment

# Copy environment template
cp env_template.sh .env

# Edit .env file with your Azure credentials
nano .env

Required Configuration:

AZURE_SPEECH_KEY and AZURE_SPEECH_KEY_ENDPOINT
AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_KEY, and AZURE_OPENAI_DEPLOYMENT
AZURE_BLOB_CONNECTION, AZURE_CONTAINER, and AZURE_BLOB_SAS_TOKEN
COMPUTER_VISION_ENDPOINT and COMPUTER_VISION_KEY (optional)

3. Run the Application

# Start the service
python app.py

The service will be available at http://localhost:7860

📁 Project Structure

azure-ai-conference-service/
├── app.py                  # Main Gradio application
├── app_core.py            # Core backend logic and database
├── ai_summary.py          # AI summarization manager
├── file_processors.py     # Document processing utilities
├── image_extraction.py    # Video frame extraction
├── requirements.txt       # Python dependencies
├── env_template.sh        # Environment configuration template
├── .env                   # Your configuration (create from template)
├── database/              # SQLite database files
├── uploads/              # Temporary upload processing
├── temp/                 # Temporary files and downloads
└── logs/                 # Application logs

🔧 Configuration Guide

Azure Services Setup

1. Azure Speech Services

# Create Speech resource
az cognitiveservices account create \
  --name "your-speech-service" \
  --resource-group "your-rg" \
  --kind "SpeechServices" \
  --sku "S0" \
  --location "your-region"

2. Azure OpenAI Service

# Create OpenAI resource
az cognitiveservices account create \
  --name "your-openai-service" \
  --resource-group "your-rg" \
  --kind "OpenAI" \
  --sku "S0" \
  --location "your-region"

# Deploy model
az cognitiveservices account deployment create \
  --name "your-openai-service" \
  --resource-group "your-rg" \
  --deployment-name "gpt-4o-mini" \
  --model-name "gpt-4o-mini" \
  --model-version "2024-07-18"

3. Azure Blob Storage

# Create storage account
az storage account create \
  --name "yourstorageaccount" \
  --resource-group "your-rg" \
  --location "your-region" \
  --sku "Standard_LRS"

# Create containers
az storage container create --name "transcripts" --account-name "yourstorageaccount"
az storage container create --name "transcripts-summaries" --account-name "yourstorageaccount"
az storage container create --name "transcripts-chats" --account-name "yourstorageaccount"

Environment Variables Reference

Variable	Description	Required
`AZURE_SPEECH_KEY`	Azure Speech Services API key	✅
`AZURE_SPEECH_KEY_ENDPOINT`	Speech Services endpoint URL	✅
`AZURE_OPENAI_ENDPOINT`	Azure OpenAI endpoint URL	✅
`AZURE_OPENAI_KEY`	Azure OpenAI API key	✅
`AZURE_OPENAI_DEPLOYMENT`	Model deployment name	✅
`AZURE_BLOB_CONNECTION`	Blob storage connection string	✅
`AZURE_CONTAINER`	Main blob container name	✅
`AZURE_BLOB_SAS_TOKEN`	SAS token for blob access	✅
`COMPUTER_VISION_ENDPOINT`	Computer Vision endpoint	⚠️
`COMPUTER_VISION_KEY`	Computer Vision API key	⚠️

Legend: ✅ Required | ⚠️ Recommended

🎯 Usage Examples

Basic Transcription

Register/Login to the service
Upload an audio or video file
Configure language and speaker settings
Start transcription and wait for auto-refresh
Download the completed transcript

AI-Powered Summary

Choose content sources: existing transcripts or new files
Provide AI instructions: specify format and focus areas
Configure output: language and format preferences
Generate summary with multi-modal analysis
Download comprehensive AI analysis

Batch Processing

Upload multiple files simultaneously
Process presentations, documents, and videos together
Generate unified summaries across all content types

🔐 Security Features

Authentication & Authorization

Secure user registration with password strength validation
Session management with proper logout functionality
User isolation - users can only access their own data

Data Protection

User-separated blob storage containers
Encrypted data transmission over HTTPS
Audit logging for all user actions
Automatic cleanup of temporary files

GDPR Compliance

Complete data export in JSON format
Right to be forgotten with full account deletion
Granular consent management for different data uses
Data retention policies with automatic cleanup

📊 Performance Optimization

Processing Efficiency

Background workers for parallel processing
Smart frame extraction using computer vision
Token optimization for AI model efficiency
Caching strategies for frequently accessed data

Scalability

Horizontal scaling support with load balancing
Resource limits and rate limiting
Efficient database queries with proper indexing
Auto-cleanup of old data and temporary files

🛠️ Development

Local Development Setup

# Install development dependencies
pip install -r requirements.txt

# Set development mode
export DEV_MODE=True

# Run with auto-reload
python app.py --reload

Testing

# Run basic tests
python -m pytest tests/

# Test Azure connections
python -c "from app_core import transcription_manager; print('✅ Backend connected')"
python -c "from ai_summary import ai_summary_manager; print('✅ AI service connected')"

Adding New Features

Backend Logic: Add to app_core.py or create new modules
AI Features: Extend ai_summary.py with new capabilities
File Processing: Add new formats to file_processors.py
UI Components: Update app.py with new Gradio components
Database: Add migrations to database schema as needed

📈 Monitoring & Troubleshooting

Logging

Application logs: Check logs/ai_conference_service.log
Error tracking: Monitor console output for errors
Performance metrics: Track processing times and success rates

Common Issues

Connection Issues

# Test Azure Speech
curl -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \
     "https://YOUR_REGION.api.cognitive.microsoft.com/sts/v1.0/issuetoken"

# Test Azure OpenAI
curl -H "api-key: YOUR_KEY" \
     "https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_MODEL/chat/completions?api-version=2024-08-01-preview"

File Processing Issues

Ensure FFmpeg is installed and in PATH
Check file format support in file_processors.py
Verify file size limits (default: 500MB)

Database Issues

Check database permissions for database/ directory
Verify blob storage connection for database backups
Monitor disk space for database growth

🚢 Production Deployment

Docker Deployment

FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    ffmpeg \
    libsm6 \
    libxext6 \
    libxrender-dev \
    libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 7860

CMD ["python", "app.py"]

Azure Container Instance

# Build and push image
docker build -t azure-ai-conference-service .
docker tag azure-ai-conference-service your-registry.azurecr.io/azure-ai-conference-service
docker push your-registry.azurecr.io/azure-ai-conference-service

# Deploy to Azure Container Instances
az container create \
  --resource-group your-rg \
  --name azure-ai-conference-service \
  --image your-registry.azurecr.io/azure-ai-conference-service \
  --ports 7860 \
  --environment-variables \
    AZURE_SPEECH_KEY=$AZURE_SPEECH_KEY \
    AZURE_OPENAI_KEY=$AZURE_OPENAI_KEY \
    # ... other environment variables

Production Checklist

Security: Change default passwords and salts
SSL/TLS: Configure HTTPS certificates
Monitoring: Set up Azure Application Insights
Backup: Configure database and blob backup strategies
Scaling: Configure auto-scaling policies
Compliance: Review and configure GDPR settings

📚 API Reference

Core Classes

`TranscriptionManager`

submit_transcription(file_bytes, filename, user_id, language, settings)
get_job_status(job_id)
get_user_history(user_id, limit)

`AISummaryManager`

submit_summary_job(user_id, summary_type, user_prompt, files, settings)
get_summary_status(job_id)
get_user_summary_history(user_id, limit)

`FileProcessor`

process_file(file_path, extension)
batch_process_files(file_paths)
get_file_info(file_path)

🤝 Contributing

We welcome contributions! Please see our contributing guidelines:

Fork the repository
Create a feature branch
Make your changes with tests
Submit a pull request

Development Standards

Code style: Follow PEP 8 for Python code
Documentation: Update README and docstrings
Testing: Add tests for new features
Security: Follow security best practices

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Getting Help

Documentation: Check this README and inline comments
Issues: Create GitHub issues for bugs or feature requests
Azure Support: Use Azure support for service-specific issues

Contact Information

Project maintainer: [Your contact information]
Technical support: [Support email]
Azure resources: [Azure documentation links]

🎉 Acknowledgments

Azure AI Services for powerful AI capabilities
Gradio for the excellent web interface framework
OpenCV for computer vision functionality
Contributors and the open-source community

🚀 Ready to transform your conference analysis with AI? Get started today!