Spaces:

Betimes-Solution
/

Azure_Powered_AI_Summary

Sleeping

App Files Files Community

Chirapath commited on Sep 2, 2025

Commit

d2b2e25

verified ·

1 Parent(s): 339ef9e

Update README.md

Browse files

Files changed (1) hide show

README.md +463 -1

README.md CHANGED Viewed

@@ -9,4 +9,466 @@ app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 pinned: false
 ---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# 🎙️🤖 Azure-Powered AI Conference Service
+> **Advanced AI-powered conference analysis with transcription, computer vision, and intelligent summarization using Azure AI Foundry**
+A comprehensive solution that combines Azure Speech Services for transcription with Azure OpenAI for intelligent summarization, featuring computer vision analysis, multi-format document processing, and enterprise-grade security.
+## 🌟 Key Features
+### 🎙️ **Advanced Transcription Services**
+- **High-accuracy speech-to-text** using Azure Speech Services
+- **Speaker diarization** with precise timestamp tracking (HH:MM:SS format)
+- **Multi-language support** for 60+ languages and dialects
+- **Real-time processing** with auto-refresh status updates
+- **Enhanced audio processing** with FFmpeg integration
+### 🤖 **AI-Powered Summarization**
+- **Intelligent conference analysis** using Azure OpenAI (GPT-4o models)
+- **Multi-modal content processing** (transcripts, documents, images, videos)
+- **Smart frame extraction** from presentation videos
+- **Executive summaries** with action items and key insights
+- **Multi-language output** support
+### 👁️ **Computer Vision Integration**
+- **Automatic frame extraction** from videos using content-aware algorithms
+- **OCR text extraction** from images and video frames
+- **Slide change detection** for presentation content
+- **Meeting scene analysis** for conference recordings
+### 📄 **Enhanced Document Processing**
+- **Comprehensive format support**: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, CSV, TXT, JSON, RTF, ODT, ODS, ODP
+- **Intelligent content extraction** with table and image handling
+- **Batch processing** capabilities for multiple files
+- **Error handling** and encoding detection
+### 🔐 **Enterprise Security & GDPR Compliance**
+- **User authentication** with secure password hashing
+- **User-isolated storage** in Azure Blob containers
+- **Complete data export** functionality for GDPR compliance
+- **Account deletion** with full data removal
+- **Audit logging** and comprehensive privacy controls
+### 🎯 **User Experience**
+- **Modern web interface** built with Gradio
+- **Real-time status updates** with auto-refresh functionality
+- **Comprehensive history** tracking for all services
+- **Direct download** links for completed work
+- **Mobile-responsive** design
+## 🏗️ Architecture Overview
+```mermaid
+graph TB
+    subgraph "Frontend"
+        A[Gradio Web Interface]
+    end
+    subgraph "Core Services"
+        B[Transcription Manager]
+        C[AI Summary Manager]
+        D[File Processor]
+        E[Video Frame Extractor]
+    end
+    subgraph "Azure Services"
+        F[Azure Speech Services]
+        G[Azure OpenAI]
+        H[Azure Computer Vision]
+        I[Azure Blob Storage]
+    end
+    subgraph "Data Layer"
+        J[SQLite Database]
+        K[User-Isolated Containers]
+    end
+    A --> B
+    A --> C
+    B --> F
+    B --> I
+    C --> G
+    C --> H
+    C --> D
+    C --> E
+    B --> J
+    C --> J
+    I --> K
+```
+## 🚀 Quick Start
+### Prerequisites
+- **Python 3.8+** installed
+- **FFmpeg** installed for audio/video processing
+- **Azure subscription** with the following services:
+  - Azure Speech Services
+  - Azure OpenAI Service
+  - Azure Blob Storage
+  - Azure Computer Vision (optional but recommended)
+### 1. Clone and Setup
+```bash
+# Clone the repository
+git clone <repository-url>
+cd azure-ai-conference-service
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+# Install dependencies
+pip install -r requirements.txt
+```
+### 2. Configure Environment
+```bash
+# Copy environment template
+cp env_template.sh .env
+# Edit .env file with your Azure credentials
+nano .env
+```
+**Required Configuration:**
+- `AZURE_SPEECH_KEY` and `AZURE_SPEECH_KEY_ENDPOINT`
+- `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_KEY`, and `AZURE_OPENAI_DEPLOYMENT`
+- `AZURE_BLOB_CONNECTION`, `AZURE_CONTAINER`, and `AZURE_BLOB_SAS_TOKEN`
+- `COMPUTER_VISION_ENDPOINT` and `COMPUTER_VISION_KEY` (optional)
+### 3. Run the Application
+```bash
+# Start the service
+python app.py
+```
+The service will be available at `http://localhost:7860`
+## 📁 Project Structure
+```
+azure-ai-conference-service/
+├── app.py                  # Main Gradio application
+├── app_core.py            # Core backend logic and database
+├── ai_summary.py          # AI summarization manager
+├── file_processors.py     # Document processing utilities
+├── image_extraction.py    # Video frame extraction
+├── requirements.txt       # Python dependencies
+├── env_template.sh        # Environment configuration template
+├── .env                   # Your configuration (create from template)
+├── database/              # SQLite database files
+├── uploads/              # Temporary upload processing
+├── temp/                 # Temporary files and downloads
+└── logs/                 # Application logs
+```
+## 🔧 Configuration Guide
+### Azure Services Setup
+#### 1. Azure Speech Services
+```bash
+# Create Speech resource
+az cognitiveservices account create \
+  --name "your-speech-service" \
+  --resource-group "your-rg" \
+  --kind "SpeechServices" \
+  --sku "S0" \
+  --location "your-region"
+```
+#### 2. Azure OpenAI Service
+```bash
+# Create OpenAI resource
+az cognitiveservices account create \
+  --name "your-openai-service" \
+  --resource-group "your-rg" \
+  --kind "OpenAI" \
+  --sku "S0" \
+  --location "your-region"
+# Deploy model
+az cognitiveservices account deployment create \
+  --name "your-openai-service" \
+  --resource-group "your-rg" \
+  --deployment-name "gpt-4o-mini" \
+  --model-name "gpt-4o-mini" \
+  --model-version "2024-07-18"
+```
+#### 3. Azure Blob Storage
+```bash
+# Create storage account
+az storage account create \
+  --name "yourstorageaccount" \
+  --resource-group "your-rg" \
+  --location "your-region" \
+  --sku "Standard_LRS"
+# Create containers
+az storage container create --name "transcripts" --account-name "yourstorageaccount"
+az storage container create --name "transcripts-summaries" --account-name "yourstorageaccount"
+az storage container create --name "transcripts-chats" --account-name "yourstorageaccount"
+```
+### Environment Variables Reference
+| Variable | Description | Required |
+|----------|-------------|----------|
+| `AZURE_SPEECH_KEY` | Azure Speech Services API key | ✅ |
+| `AZURE_SPEECH_KEY_ENDPOINT` | Speech Services endpoint URL | ✅ |
+| `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint URL | ✅ |
+| `AZURE_OPENAI_KEY` | Azure OpenAI API key | ✅ |
+| `AZURE_OPENAI_DEPLOYMENT` | Model deployment name | ✅ |
+| `AZURE_BLOB_CONNECTION` | Blob storage connection string | ✅ |
+| `AZURE_CONTAINER` | Main blob container name | ✅ |
+| `AZURE_BLOB_SAS_TOKEN` | SAS token for blob access | ✅ |
+| `COMPUTER_VISION_ENDPOINT` | Computer Vision endpoint | ⚠️ |
+| `COMPUTER_VISION_KEY` | Computer Vision API key | ⚠️ |
+**Legend:** ✅ Required | ⚠️ Recommended
+## 🎯 Usage Examples
+### Basic Transcription
+1. **Register/Login** to the service
+2. **Upload** an audio or video file
+3. **Configure** language and speaker settings
+4. **Start transcription** and wait for auto-refresh
+5. **Download** the completed transcript
+### AI-Powered Summary
+1. **Choose content sources**: existing transcripts or new files
+2. **Provide AI instructions**: specify format and focus areas
+3. **Configure output**: language and format preferences
+4. **Generate summary** with multi-modal analysis
+5. **Download** comprehensive AI analysis
+### Batch Processing
+- Upload multiple files simultaneously
+- Process presentations, documents, and videos together
+- Generate unified summaries across all content types
+## 🔐 Security Features
+### Authentication & Authorization
+- **Secure user registration** with password strength validation
+- **Session management** with proper logout functionality
+- **User isolation** - users can only access their own data
+### Data Protection
+- **User-separated blob storage** containers
+- **Encrypted data transmission** over HTTPS
+- **Audit logging** for all user actions
+- **Automatic cleanup** of temporary files
+### GDPR Compliance
+- **Complete data export** in JSON format
+- **Right to be forgotten** with full account deletion
+- **Granular consent management** for different data uses
+- **Data retention policies** with automatic cleanup
+## 📊 Performance Optimization
+### Processing Efficiency
+- **Background workers** for parallel processing
+- **Smart frame extraction** using computer vision
+- **Token optimization** for AI model efficiency
+- **Caching strategies** for frequently accessed data
+### Scalability
+- **Horizontal scaling** support with load balancing
+- **Resource limits** and rate limiting
+- **Efficient database queries** with proper indexing
+- **Auto-cleanup** of old data and temporary files
+## 🛠️ Development
+### Local Development Setup
+```bash
+# Install development dependencies
+pip install -r requirements.txt
+# Set development mode
+export DEV_MODE=True
+# Run with auto-reload
+python app.py --reload
+```
+### Testing
+```bash
+# Run basic tests
+python -m pytest tests/
+# Test Azure connections
+python -c "from app_core import transcription_manager; print('✅ Backend connected')"
+python -c "from ai_summary import ai_summary_manager; print('✅ AI service connected')"
+```
+### Adding New Features
+1. **Backend Logic**: Add to `app_core.py` or create new modules
+2. **AI Features**: Extend `ai_summary.py` with new capabilities
+3. **File Processing**: Add new formats to `file_processors.py`
+4. **UI Components**: Update `app.py` with new Gradio components
+5. **Database**: Add migrations to database schema as needed
+## 📈 Monitoring & Troubleshooting
+### Logging
+- **Application logs**: Check `logs/ai_conference_service.log`
+- **Error tracking**: Monitor console output for errors
+- **Performance metrics**: Track processing times and success rates
+### Common Issues
+#### Connection Issues
+```bash
+# Test Azure Speech
+curl -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \
+     "https://YOUR_REGION.api.cognitive.microsoft.com/sts/v1.0/issuetoken"
+# Test Azure OpenAI
+curl -H "api-key: YOUR_KEY" \
+     "https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_MODEL/chat/completions?api-version=2024-08-01-preview"
+```
+#### File Processing Issues
+- Ensure **FFmpeg** is installed and in PATH
+- Check file format support in `file_processors.py`
+- Verify file size limits (default: 500MB)
+#### Database Issues
+- Check database permissions for `database/` directory
+- Verify blob storage connection for database backups
+- Monitor disk space for database growth
+## 🚢 Production Deployment
+### Docker Deployment
+```dockerfile
+FROM python:3.9-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    ffmpeg \
+    libsm6 \
+    libxext6 \
+    libxrender-dev \
+    libglib2.0-0 \
+    && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+COPY . .
+EXPOSE 7860
+CMD ["python", "app.py"]
+```
+### Azure Container Instance
+```bash
+# Build and push image
+docker build -t azure-ai-conference-service .
+docker tag azure-ai-conference-service your-registry.azurecr.io/azure-ai-conference-service
+docker push your-registry.azurecr.io/azure-ai-conference-service
+# Deploy to Azure Container Instances
+az container create \
+  --resource-group your-rg \
+  --name azure-ai-conference-service \
+  --image your-registry.azurecr.io/azure-ai-conference-service \
+  --ports 7860 \
+  --environment-variables \
+    AZURE_SPEECH_KEY=$AZURE_SPEECH_KEY \
+    AZURE_OPENAI_KEY=$AZURE_OPENAI_KEY \
+    # ... other environment variables
+```
+### Production Checklist
+- [ ] **Security**: Change default passwords and salts
+- [ ] **SSL/TLS**: Configure HTTPS certificates
+- [ ] **Monitoring**: Set up Azure Application Insights
+- [ ] **Backup**: Configure database and blob backup strategies
+- [ ] **Scaling**: Configure auto-scaling policies
+- [ ] **Compliance**: Review and configure GDPR settings
+## 📚 API Reference
+### Core Classes
+#### `TranscriptionManager`
+- `submit_transcription(file_bytes, filename, user_id, language, settings)`
+- `get_job_status(job_id)`
+- `get_user_history(user_id, limit)`
+#### `AISummaryManager`
+- `submit_summary_job(user_id, summary_type, user_prompt, files, settings)`
+- `get_summary_status(job_id)`
+- `get_user_summary_history(user_id, limit)`
+#### `FileProcessor`
+- `process_file(file_path, extension)`
+- `batch_process_files(file_paths)`
+- `get_file_info(file_path)`
+## 🤝 Contributing
+We welcome contributions! Please see our contributing guidelines:
+1. **Fork** the repository
+2. **Create** a feature branch
+3. **Make** your changes with tests
+4. **Submit** a pull request
+### Development Standards
+- **Code style**: Follow PEP 8 for Python code
+- **Documentation**: Update README and docstrings
+- **Testing**: Add tests for new features
+- **Security**: Follow security best practices
+## 📄 License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+## 🆘 Support
+### Getting Help
+- **Documentation**: Check this README and inline comments
+- **Issues**: Create GitHub issues for bugs or feature requests
+- **Azure Support**: Use Azure support for service-specific issues
+### Contact Information
+- **Project maintainer**: [Your contact information]
+- **Technical support**: [Support email]
+- **Azure resources**: [Azure documentation links]
+---
+## 🎉 Acknowledgments
+- **Azure AI Services** for powerful AI capabilities
+- **Gradio** for the excellent web interface framework
+- **OpenCV** for computer vision functionality
+- **Contributors** and the open-source community
+---
+**🚀 Ready to transform your conference analysis with AI? Get started today!**