Spaces:

Betimes-Solution
/

Azure_Powered_AI_Summary

Sleeping

App Files Files Community

Azure_Powered_AI_Summary / README.md

Chirapath

Update README.md

f63f042 verified 4 months ago

preview code

raw

history blame contribute delete

14.1 kB

	---
	title: Azure_Powered_AI_Summary
	emoji: 🔥
	colorFrom: blue
	colorTo: red
	sdk: gradio
	sdk_version: 5.44.1
	app_file: app.py
	pinned: false
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# 🎙️🤖 Azure-Powered AI Conference Service

	> Advanced AI-powered conference analysis with transcription, computer vision, and intelligent summarization using Azure AI Foundry

	A comprehensive solution that combines Azure Speech Services for transcription with Azure OpenAI for intelligent summarization, featuring computer vision analysis, multi-format document processing, and enterprise-grade security.

	## 🌟 Key Features

	### 🎙️ Advanced Transcription Services
	- High-accuracy speech-to-text using Azure Speech Services
	- Speaker diarization with precise timestamp tracking (HH:MM:SS format)
	- Multi-language support for 60+ languages and dialects
	- Real-time processing with auto-refresh status updates
	- Enhanced audio processing with FFmpeg integration

	### 🤖 AI-Powered Summarization
	- Intelligent conference analysis using Azure OpenAI (GPT-4o models)
	- Multi-modal content processing (transcripts, documents, images, videos)
	- Smart frame extraction from presentation videos
	- Executive summaries with action items and key insights
	- Multi-language output support

	### 👁️ Computer Vision Integration
	- Automatic frame extraction from videos using content-aware algorithms
	- OCR text extraction from images and video frames
	- Slide change detection for presentation content
	- Meeting scene analysis for conference recordings

	### 📄 Enhanced Document Processing
	- Comprehensive format support: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, CSV, TXT, JSON, RTF, ODT, ODS, ODP
	- Intelligent content extraction with table and image handling
	- Batch processing capabilities for multiple files
	- Error handling and encoding detection

	### 🔐 Enterprise Security & GDPR Compliance
	- User authentication with secure password hashing
	- User-isolated storage in Azure Blob containers
	- Complete data export functionality for GDPR compliance
	- Account deletion with full data removal
	- Audit logging and comprehensive privacy controls

	### 🎯 User Experience
	- Modern web interface built with Gradio
	- Real-time status updates with auto-refresh functionality
	- Comprehensive history tracking for all services
	- Direct download links for completed work
	- Mobile-responsive design

	## 🏗️ Architecture Overview

	```mermaid
	graph TB
	subgraph "Frontend"
	A[Gradio Web Interface]
	end

	subgraph "Core Services"
	B[Transcription Manager]
	C[AI Summary Manager]
	D[File Processor]
	E[Video Frame Extractor]
	end

	subgraph "Azure Services"
	F[Azure Speech Services]
	G[Azure OpenAI]
	H[Azure Computer Vision]
	I[Azure Blob Storage]
	end

	subgraph "Data Layer"
	J[SQLite Database]
	K[User-Isolated Containers]
	end

	A --> B
	A --> C
	B --> F
	B --> I
	C --> G
	C --> H
	C --> D
	C --> E
	B --> J
	C --> J
	I --> K
	```

	## 🚀 Quick Start

	### Prerequisites

	- Python 3.8+ installed
	- FFmpeg installed for audio/video processing
	- Azure subscription with the following services:
	- Azure Speech Services
	- Azure OpenAI Service
	- Azure Blob Storage
	- Azure Computer Vision (optional but recommended)

	### 1. Clone and Setup

	```bash
	# Clone the repository
	git clone <repository-url>
	cd azure-ai-conference-service

	# Create virtual environment
	python -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate

	# Install dependencies
	pip install -r requirements.txt
	```

	### 2. Configure Environment

	```bash
	# Copy environment template
	cp env_template.sh .env

	# Edit .env file with your Azure credentials
	nano .env
	```

	Required Configuration:
	- `AZURE_SPEECH_KEY` and `AZURE_SPEECH_KEY_ENDPOINT`
	- `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_KEY`, and `AZURE_OPENAI_DEPLOYMENT`
	- `AZURE_BLOB_CONNECTION`, `AZURE_CONTAINER`, and `AZURE_BLOB_SAS_TOKEN`
	- `COMPUTER_VISION_ENDPOINT` and `COMPUTER_VISION_KEY` (optional)

	### 3. Run the Application

	```bash
	# Start the service
	python app.py
	```

	The service will be available at `http://localhost:7860`

	## 📁 Project Structure

	```
	azure-ai-conference-service/
	├── app.py # Main Gradio application
	├── app_core.py # Core backend logic and database
	├── ai_summary.py # AI summarization manager
	├── file_processors.py # Document processing utilities
	├── image_extraction.py # Video frame extraction
	├── requirements.txt # Python dependencies
	├── env_template.sh # Environment configuration template
	├── .env # Your configuration (create from template)
	├── database/ # SQLite database files
	├── uploads/ # Temporary upload processing
	├── temp/ # Temporary files and downloads
	└── logs/ # Application logs
	```

	## 🔧 Configuration Guide

	### Azure Services Setup

	#### 1. Azure Speech Services
	```bash
	# Create Speech resource
	az cognitiveservices account create \
	--name "your-speech-service" \
	--resource-group "your-rg" \
	--kind "SpeechServices" \
	--sku "S0" \
	--location "your-region"
	```

	#### 2. Azure OpenAI Service
	```bash
	# Create OpenAI resource
	az cognitiveservices account create \
	--name "your-openai-service" \
	--resource-group "your-rg" \
	--kind "OpenAI" \
	--sku "S0" \
	--location "your-region"

	# Deploy model
	az cognitiveservices account deployment create \
	--name "your-openai-service" \
	--resource-group "your-rg" \
	--deployment-name "gpt-4o-mini" \
	--model-name "gpt-4o-mini" \
	--model-version "2024-07-18"
	```

	#### 3. Azure Blob Storage
	```bash
	# Create storage account
	az storage account create \
	--name "yourstorageaccount" \
	--resource-group "your-rg" \
	--location "your-region" \
	--sku "Standard_LRS"

	# Create containers
	az storage container create --name "transcripts" --account-name "yourstorageaccount"
	az storage container create --name "transcripts-summaries" --account-name "yourstorageaccount"
	az storage container create --name "transcripts-chats" --account-name "yourstorageaccount"
	```

	### Environment Variables Reference

	\| Variable \| Description \| Required \|
	\|----------\|-------------\|----------\|
	\| `AZURE_SPEECH_KEY` \| Azure Speech Services API key \| ✅ \|
	\| `AZURE_SPEECH_KEY_ENDPOINT` \| Speech Services endpoint URL \| ✅ \|
	\| `AZURE_OPENAI_ENDPOINT` \| Azure OpenAI endpoint URL \| ✅ \|
	\| `AZURE_OPENAI_KEY` \| Azure OpenAI API key \| ✅ \|
	\| `AZURE_OPENAI_DEPLOYMENT` \| Model deployment name \| ✅ \|
	\| `AZURE_BLOB_CONNECTION` \| Blob storage connection string \| ✅ \|
	\| `AZURE_CONTAINER` \| Main blob container name \| ✅ \|
	\| `AZURE_BLOB_SAS_TOKEN` \| SAS token for blob access \| ✅ \|
	\| `COMPUTER_VISION_ENDPOINT` \| Computer Vision endpoint \| ⚠️ \|
	\| `COMPUTER_VISION_KEY` \| Computer Vision API key \| ⚠️ \|

	Legend: ✅ Required \| ⚠️ Recommended

	## 🎯 Usage Examples

	### Basic Transcription
	1. Register/Login to the service
	2. Upload an audio or video file
	3. Configure language and speaker settings
	4. Start transcription and wait for auto-refresh
	5. Download the completed transcript

	### AI-Powered Summary
	1. Choose content sources: existing transcripts or new files
	2. Provide AI instructions: specify format and focus areas
	3. Configure output: language and format preferences
	4. Generate summary with multi-modal analysis
	5. Download comprehensive AI analysis

	### Batch Processing
	- Upload multiple files simultaneously
	- Process presentations, documents, and videos together
	- Generate unified summaries across all content types

	## 🔐 Security Features

	### Authentication & Authorization
	- Secure user registration with password strength validation
	- Session management with proper logout functionality
	- User isolation - users can only access their own data

	### Data Protection
	- User-separated blob storage containers
	- Encrypted data transmission over HTTPS
	- Audit logging for all user actions
	- Automatic cleanup of temporary files

	### GDPR Compliance
	- Complete data export in JSON format
	- Right to be forgotten with full account deletion
	- Granular consent management for different data uses
	- Data retention policies with automatic cleanup

	## 📊 Performance Optimization

	### Processing Efficiency
	- Background workers for parallel processing
	- Smart frame extraction using computer vision
	- Token optimization for AI model efficiency
	- Caching strategies for frequently accessed data

	### Scalability
	- Horizontal scaling support with load balancing
	- Resource limits and rate limiting
	- Efficient database queries with proper indexing
	- Auto-cleanup of old data and temporary files

	## 🛠️ Development

	### Local Development Setup

	```bash
	# Install development dependencies
	pip install -r requirements.txt

	# Set development mode
	export DEV_MODE=True

	# Run with auto-reload
	python app.py --reload
	```

	### Testing

	```bash
	# Run basic tests
	python -m pytest tests/

	# Test Azure connections
	python -c "from app_core import transcription_manager; print('✅ Backend connected')"
	python -c "from ai_summary import ai_summary_manager; print('✅ AI service connected')"
	```

	### Adding New Features

	1. Backend Logic: Add to `app_core.py` or create new modules
	2. AI Features: Extend `ai_summary.py` with new capabilities
	3. File Processing: Add new formats to `file_processors.py`
	4. UI Components: Update `app.py` with new Gradio components
	5. Database: Add migrations to database schema as needed

	## 📈 Monitoring & Troubleshooting

	### Logging
	- Application logs: Check `logs/ai_conference_service.log`
	- Error tracking: Monitor console output for errors
	- Performance metrics: Track processing times and success rates

	### Common Issues

	#### Connection Issues
	```bash
	# Test Azure Speech
	curl -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \
	"https://YOUR_REGION.api.cognitive.microsoft.com/sts/v1.0/issuetoken"

	# Test Azure OpenAI
	curl -H "api-key: YOUR_KEY" \
	"https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_MODEL/chat/completions?api-version=2024-08-01-preview"
	```

	#### File Processing Issues
	- Ensure FFmpeg is installed and in PATH
	- Check file format support in `file_processors.py`
	- Verify file size limits (default: 500MB)

	#### Database Issues
	- Check database permissions for `database/` directory
	- Verify blob storage connection for database backups
	- Monitor disk space for database growth

	## 🚢 Production Deployment

	### Docker Deployment

	```dockerfile
	FROM python:3.9-slim

	WORKDIR /app

	# Install system dependencies
	RUN apt-get update && apt-get install -y \
	ffmpeg \
	libsm6 \
	libxext6 \
	libxrender-dev \
	libglib2.0-0 \
	&& rm -rf /var/lib/apt/lists/*

	COPY requirements.txt .
	RUN pip install -r requirements.txt

	COPY . .

	EXPOSE 7860

	CMD ["python", "app.py"]
	```

	### Azure Container Instance

	```bash
	# Build and push image
	docker build -t azure-ai-conference-service .
	docker tag azure-ai-conference-service your-registry.azurecr.io/azure-ai-conference-service
	docker push your-registry.azurecr.io/azure-ai-conference-service

	# Deploy to Azure Container Instances
	az container create \
	--resource-group your-rg \
	--name azure-ai-conference-service \
	--image your-registry.azurecr.io/azure-ai-conference-service \
	--ports 7860 \
	--environment-variables \
	AZURE_SPEECH_KEY=$AZURE_SPEECH_KEY \
	AZURE_OPENAI_KEY=$AZURE_OPENAI_KEY \
	# ... other environment variables
	```

	### Production Checklist

	- [ ] Security: Change default passwords and salts
	- [ ] SSL/TLS: Configure HTTPS certificates
	- [ ] Monitoring: Set up Azure Application Insights
	- [ ] Backup: Configure database and blob backup strategies
	- [ ] Scaling: Configure auto-scaling policies
	- [ ] Compliance: Review and configure GDPR settings

	## 📚 API Reference

	### Core Classes

	#### `TranscriptionManager`
	- `submit_transcription(file_bytes, filename, user_id, language, settings)`
	- `get_job_status(job_id)`
	- `get_user_history(user_id, limit)`

	#### `AISummaryManager`
	- `submit_summary_job(user_id, summary_type, user_prompt, files, settings)`
	- `get_summary_status(job_id)`
	- `get_user_summary_history(user_id, limit)`

	#### `FileProcessor`
	- `process_file(file_path, extension)`
	- `batch_process_files(file_paths)`
	- `get_file_info(file_path)`

	## 🤝 Contributing

	We welcome contributions! Please see our contributing guidelines:

	1. Fork the repository
	2. Create a feature branch
	3. Make your changes with tests
	4. Submit a pull request

	### Development Standards
	- Code style: Follow PEP 8 for Python code
	- Documentation: Update README and docstrings
	- Testing: Add tests for new features
	- Security: Follow security best practices

	## 📄 License

	This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

	## 🆘 Support

	### Getting Help
	- Documentation: Check this README and inline comments
	- Issues: Create GitHub issues for bugs or feature requests
	- Azure Support: Use Azure support for service-specific issues

	### Contact Information
	- Project maintainer: [Your contact information]
	- Technical support: [Support email]
	- Azure resources: [Azure documentation links]

	---

	## 🎉 Acknowledgments

	- Azure AI Services for powerful AI capabilities
	- Gradio for the excellent web interface framework
	- OpenCV for computer vision functionality
	- Contributors and the open-source community

	---

	🚀 Ready to transform your conference analysis with AI? Get started today!