Spaces:

Betimes-Solution
/

Azure_Powered_AI_Summary

Sleeping

File size: 14,121 Bytes

339ef9e
f63f042
339ef9e
 
f63f042
339ef9e
 
 
 
 
 
d2b2e25

---
title: Azure_Powered_AI_Summary
emoji: 🔥
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# 🎙️🤖 Azure-Powered AI Conference Service

> **Advanced AI-powered conference analysis with transcription, computer vision, and intelligent summarization using Azure AI Foundry**

A comprehensive solution that combines Azure Speech Services for transcription with Azure OpenAI for intelligent summarization, featuring computer vision analysis, multi-format document processing, and enterprise-grade security.

## 🌟 Key Features

### 🎙️ **Advanced Transcription Services**
- **High-accuracy speech-to-text** using Azure Speech Services
- **Speaker diarization** with precise timestamp tracking (HH:MM:SS format)
- **Multi-language support** for 60+ languages and dialects
- **Real-time processing** with auto-refresh status updates
- **Enhanced audio processing** with FFmpeg integration

### 🤖 **AI-Powered Summarization**
- **Intelligent conference analysis** using Azure OpenAI (GPT-4o models)
- **Multi-modal content processing** (transcripts, documents, images, videos)
- **Smart frame extraction** from presentation videos
- **Executive summaries** with action items and key insights
- **Multi-language output** support

### 👁️ **Computer Vision Integration**
- **Automatic frame extraction** from videos using content-aware algorithms
- **OCR text extraction** from images and video frames
- **Slide change detection** for presentation content
- **Meeting scene analysis** for conference recordings

### 📄 **Enhanced Document Processing**
- **Comprehensive format support**: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, CSV, TXT, JSON, RTF, ODT, ODS, ODP
- **Intelligent content extraction** with table and image handling
- **Batch processing** capabilities for multiple files
- **Error handling** and encoding detection

### 🔐 **Enterprise Security & GDPR Compliance**
- **User authentication** with secure password hashing
- **User-isolated storage** in Azure Blob containers
- **Complete data export** functionality for GDPR compliance
- **Account deletion** with full data removal
- **Audit logging** and comprehensive privacy controls

### 🎯 **User Experience**
- **Modern web interface** built with Gradio
- **Real-time status updates** with auto-refresh functionality
- **Comprehensive history** tracking for all services
- **Direct download** links for completed work
- **Mobile-responsive** design

## 🏗️ Architecture Overview

```mermaid
graph TB
    subgraph "Frontend"
        A[Gradio Web Interface]
    end
    
    subgraph "Core Services"
        B[Transcription Manager]
        C[AI Summary Manager] 
        D[File Processor]
        E[Video Frame Extractor]
    end
    
    subgraph "Azure Services"
        F[Azure Speech Services]
        G[Azure OpenAI]
        H[Azure Computer Vision]
        I[Azure Blob Storage]
    end
    
    subgraph "Data Layer"
        J[SQLite Database]
        K[User-Isolated Containers]
    end
    
    A --> B
    A --> C
    B --> F
    B --> I
    C --> G
    C --> H
    C --> D
    C --> E
    B --> J
    C --> J
    I --> K
```

## 🚀 Quick Start

### Prerequisites

- **Python 3.8+** installed
- **FFmpeg** installed for audio/video processing
- **Azure subscription** with the following services:
  - Azure Speech Services
  - Azure OpenAI Service
  - Azure Blob Storage
  - Azure Computer Vision (optional but recommended)

### 1. Clone and Setup

```bash
# Clone the repository
git clone <repository-url>
cd azure-ai-conference-service

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

### 2. Configure Environment

```bash
# Copy environment template
cp env_template.sh .env

# Edit .env file with your Azure credentials
nano .env
```

**Required Configuration:**
- `AZURE_SPEECH_KEY` and `AZURE_SPEECH_KEY_ENDPOINT`
- `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_KEY`, and `AZURE_OPENAI_DEPLOYMENT`
- `AZURE_BLOB_CONNECTION`, `AZURE_CONTAINER`, and `AZURE_BLOB_SAS_TOKEN`
- `COMPUTER_VISION_ENDPOINT` and `COMPUTER_VISION_KEY` (optional)

### 3. Run the Application

```bash
# Start the service
python app.py
```

The service will be available at `http://localhost:7860`

## 📁 Project Structure

```
azure-ai-conference-service/
├── app.py                  # Main Gradio application
├── app_core.py            # Core backend logic and database
├── ai_summary.py          # AI summarization manager
├── file_processors.py     # Document processing utilities
├── image_extraction.py    # Video frame extraction
├── requirements.txt       # Python dependencies
├── env_template.sh        # Environment configuration template
├── .env                   # Your configuration (create from template)
├── database/              # SQLite database files
├── uploads/              # Temporary upload processing
├── temp/                 # Temporary files and downloads
└── logs/                 # Application logs
```

## 🔧 Configuration Guide

### Azure Services Setup

#### 1. Azure Speech Services
```bash
# Create Speech resource
az cognitiveservices account create \
  --name "your-speech-service" \
  --resource-group "your-rg" \
  --kind "SpeechServices" \
  --sku "S0" \
  --location "your-region"
```

#### 2. Azure OpenAI Service
```bash
# Create OpenAI resource
az cognitiveservices account create \
  --name "your-openai-service" \
  --resource-group "your-rg" \
  --kind "OpenAI" \
  --sku "S0" \
  --location "your-region"

# Deploy model
az cognitiveservices account deployment create \
  --name "your-openai-service" \
  --resource-group "your-rg" \
  --deployment-name "gpt-4o-mini" \
  --model-name "gpt-4o-mini" \
  --model-version "2024-07-18"
```

#### 3. Azure Blob Storage
```bash
# Create storage account
az storage account create \
  --name "yourstorageaccount" \
  --resource-group "your-rg" \
  --location "your-region" \
  --sku "Standard_LRS"

# Create containers
az storage container create --name "transcripts" --account-name "yourstorageaccount"
az storage container create --name "transcripts-summaries" --account-name "yourstorageaccount"
az storage container create --name "transcripts-chats" --account-name "yourstorageaccount"
```

### Environment Variables Reference

| Variable | Description | Required |
|----------|-------------|----------|
| `AZURE_SPEECH_KEY` | Azure Speech Services API key | ✅ |
| `AZURE_SPEECH_KEY_ENDPOINT` | Speech Services endpoint URL | ✅ |
| `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint URL | ✅ |
| `AZURE_OPENAI_KEY` | Azure OpenAI API key | ✅ |
| `AZURE_OPENAI_DEPLOYMENT` | Model deployment name | ✅ |
| `AZURE_BLOB_CONNECTION` | Blob storage connection string | ✅ |
| `AZURE_CONTAINER` | Main blob container name | ✅ |
| `AZURE_BLOB_SAS_TOKEN` | SAS token for blob access | ✅ |
| `COMPUTER_VISION_ENDPOINT` | Computer Vision endpoint | ⚠️ |
| `COMPUTER_VISION_KEY` | Computer Vision API key | ⚠️ |

**Legend:** ✅ Required | ⚠️ Recommended

## 🎯 Usage Examples

### Basic Transcription
1. **Register/Login** to the service
2. **Upload** an audio or video file
3. **Configure** language and speaker settings
4. **Start transcription** and wait for auto-refresh
5. **Download** the completed transcript

### AI-Powered Summary
1. **Choose content sources**: existing transcripts or new files
2. **Provide AI instructions**: specify format and focus areas
3. **Configure output**: language and format preferences
4. **Generate summary** with multi-modal analysis
5. **Download** comprehensive AI analysis

### Batch Processing
- Upload multiple files simultaneously
- Process presentations, documents, and videos together
- Generate unified summaries across all content types

## 🔐 Security Features

### Authentication & Authorization
- **Secure user registration** with password strength validation
- **Session management** with proper logout functionality
- **User isolation** - users can only access their own data

### Data Protection
- **User-separated blob storage** containers
- **Encrypted data transmission** over HTTPS
- **Audit logging** for all user actions
- **Automatic cleanup** of temporary files

### GDPR Compliance
- **Complete data export** in JSON format
- **Right to be forgotten** with full account deletion
- **Granular consent management** for different data uses
- **Data retention policies** with automatic cleanup

## 📊 Performance Optimization

### Processing Efficiency
- **Background workers** for parallel processing
- **Smart frame extraction** using computer vision
- **Token optimization** for AI model efficiency
- **Caching strategies** for frequently accessed data

### Scalability
- **Horizontal scaling** support with load balancing
- **Resource limits** and rate limiting
- **Efficient database queries** with proper indexing
- **Auto-cleanup** of old data and temporary files

## 🛠️ Development

### Local Development Setup

```bash
# Install development dependencies
pip install -r requirements.txt

# Set development mode
export DEV_MODE=True

# Run with auto-reload
python app.py --reload
```

### Testing

```bash
# Run basic tests
python -m pytest tests/

# Test Azure connections
python -c "from app_core import transcription_manager; print('✅ Backend connected')"
python -c "from ai_summary import ai_summary_manager; print('✅ AI service connected')"
```

### Adding New Features

1. **Backend Logic**: Add to `app_core.py` or create new modules
2. **AI Features**: Extend `ai_summary.py` with new capabilities  
3. **File Processing**: Add new formats to `file_processors.py`
4. **UI Components**: Update `app.py` with new Gradio components
5. **Database**: Add migrations to database schema as needed

## 📈 Monitoring & Troubleshooting

### Logging
- **Application logs**: Check `logs/ai_conference_service.log`
- **Error tracking**: Monitor console output for errors
- **Performance metrics**: Track processing times and success rates

### Common Issues

#### Connection Issues
```bash
# Test Azure Speech
curl -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \
     "https://YOUR_REGION.api.cognitive.microsoft.com/sts/v1.0/issuetoken"

# Test Azure OpenAI
curl -H "api-key: YOUR_KEY" \
     "https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_MODEL/chat/completions?api-version=2024-08-01-preview"
```

#### File Processing Issues
- Ensure **FFmpeg** is installed and in PATH
- Check file format support in `file_processors.py`
- Verify file size limits (default: 500MB)

#### Database Issues
- Check database permissions for `database/` directory
- Verify blob storage connection for database backups
- Monitor disk space for database growth

## 🚢 Production Deployment

### Docker Deployment

```dockerfile
FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    ffmpeg \
    libsm6 \
    libxext6 \
    libxrender-dev \
    libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 7860

CMD ["python", "app.py"]
```

### Azure Container Instance

```bash
# Build and push image
docker build -t azure-ai-conference-service .
docker tag azure-ai-conference-service your-registry.azurecr.io/azure-ai-conference-service
docker push your-registry.azurecr.io/azure-ai-conference-service

# Deploy to Azure Container Instances
az container create \
  --resource-group your-rg \
  --name azure-ai-conference-service \
  --image your-registry.azurecr.io/azure-ai-conference-service \
  --ports 7860 \
  --environment-variables \
    AZURE_SPEECH_KEY=$AZURE_SPEECH_KEY \
    AZURE_OPENAI_KEY=$AZURE_OPENAI_KEY \
    # ... other environment variables
```

### Production Checklist

- [ ] **Security**: Change default passwords and salts
- [ ] **SSL/TLS**: Configure HTTPS certificates
- [ ] **Monitoring**: Set up Azure Application Insights
- [ ] **Backup**: Configure database and blob backup strategies
- [ ] **Scaling**: Configure auto-scaling policies
- [ ] **Compliance**: Review and configure GDPR settings

## 📚 API Reference

### Core Classes

#### `TranscriptionManager`
- `submit_transcription(file_bytes, filename, user_id, language, settings)`
- `get_job_status(job_id)`
- `get_user_history(user_id, limit)`

#### `AISummaryManager`
- `submit_summary_job(user_id, summary_type, user_prompt, files, settings)`
- `get_summary_status(job_id)`
- `get_user_summary_history(user_id, limit)`

#### `FileProcessor`
- `process_file(file_path, extension)`
- `batch_process_files(file_paths)`
- `get_file_info(file_path)`

## 🤝 Contributing

We welcome contributions! Please see our contributing guidelines:

1. **Fork** the repository
2. **Create** a feature branch
3. **Make** your changes with tests
4. **Submit** a pull request

### Development Standards
- **Code style**: Follow PEP 8 for Python code
- **Documentation**: Update README and docstrings
- **Testing**: Add tests for new features
- **Security**: Follow security best practices

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🆘 Support

### Getting Help
- **Documentation**: Check this README and inline comments
- **Issues**: Create GitHub issues for bugs or feature requests
- **Azure Support**: Use Azure support for service-specific issues

### Contact Information
- **Project maintainer**: [Your contact information]
- **Technical support**: [Support email]
- **Azure resources**: [Azure documentation links]

---

## 🎉 Acknowledgments

- **Azure AI Services** for powerful AI capabilities
- **Gradio** for the excellent web interface framework
- **OpenCV** for computer vision functionality
- **Contributors** and the open-source community

---

**🚀 Ready to transform your conference analysis with AI? Get started today!**