Chirapath commited on
Commit
d2b2e25
Β·
verified Β·
1 Parent(s): 339ef9e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +463 -1
README.md CHANGED
@@ -9,4 +9,466 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  pinned: false
10
  ---
11
 
12
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
+
14
+ # πŸŽ™οΈπŸ€– Azure-Powered AI Conference Service
15
+
16
+ > **Advanced AI-powered conference analysis with transcription, computer vision, and intelligent summarization using Azure AI Foundry**
17
+
18
+ A comprehensive solution that combines Azure Speech Services for transcription with Azure OpenAI for intelligent summarization, featuring computer vision analysis, multi-format document processing, and enterprise-grade security.
19
+
20
+ ## 🌟 Key Features
21
+
22
+ ### πŸŽ™οΈ **Advanced Transcription Services**
23
+ - **High-accuracy speech-to-text** using Azure Speech Services
24
+ - **Speaker diarization** with precise timestamp tracking (HH:MM:SS format)
25
+ - **Multi-language support** for 60+ languages and dialects
26
+ - **Real-time processing** with auto-refresh status updates
27
+ - **Enhanced audio processing** with FFmpeg integration
28
+
29
+ ### πŸ€– **AI-Powered Summarization**
30
+ - **Intelligent conference analysis** using Azure OpenAI (GPT-4o models)
31
+ - **Multi-modal content processing** (transcripts, documents, images, videos)
32
+ - **Smart frame extraction** from presentation videos
33
+ - **Executive summaries** with action items and key insights
34
+ - **Multi-language output** support
35
+
36
+ ### πŸ‘οΈ **Computer Vision Integration**
37
+ - **Automatic frame extraction** from videos using content-aware algorithms
38
+ - **OCR text extraction** from images and video frames
39
+ - **Slide change detection** for presentation content
40
+ - **Meeting scene analysis** for conference recordings
41
+
42
+ ### πŸ“„ **Enhanced Document Processing**
43
+ - **Comprehensive format support**: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, CSV, TXT, JSON, RTF, ODT, ODS, ODP
44
+ - **Intelligent content extraction** with table and image handling
45
+ - **Batch processing** capabilities for multiple files
46
+ - **Error handling** and encoding detection
47
+
48
+ ### πŸ” **Enterprise Security & GDPR Compliance**
49
+ - **User authentication** with secure password hashing
50
+ - **User-isolated storage** in Azure Blob containers
51
+ - **Complete data export** functionality for GDPR compliance
52
+ - **Account deletion** with full data removal
53
+ - **Audit logging** and comprehensive privacy controls
54
+
55
+ ### 🎯 **User Experience**
56
+ - **Modern web interface** built with Gradio
57
+ - **Real-time status updates** with auto-refresh functionality
58
+ - **Comprehensive history** tracking for all services
59
+ - **Direct download** links for completed work
60
+ - **Mobile-responsive** design
61
+
62
+ ## πŸ—οΈ Architecture Overview
63
+
64
+ ```mermaid
65
+ graph TB
66
+ subgraph "Frontend"
67
+ A[Gradio Web Interface]
68
+ end
69
+
70
+ subgraph "Core Services"
71
+ B[Transcription Manager]
72
+ C[AI Summary Manager]
73
+ D[File Processor]
74
+ E[Video Frame Extractor]
75
+ end
76
+
77
+ subgraph "Azure Services"
78
+ F[Azure Speech Services]
79
+ G[Azure OpenAI]
80
+ H[Azure Computer Vision]
81
+ I[Azure Blob Storage]
82
+ end
83
+
84
+ subgraph "Data Layer"
85
+ J[SQLite Database]
86
+ K[User-Isolated Containers]
87
+ end
88
+
89
+ A --> B
90
+ A --> C
91
+ B --> F
92
+ B --> I
93
+ C --> G
94
+ C --> H
95
+ C --> D
96
+ C --> E
97
+ B --> J
98
+ C --> J
99
+ I --> K
100
+ ```
101
+
102
+ ## πŸš€ Quick Start
103
+
104
+ ### Prerequisites
105
+
106
+ - **Python 3.8+** installed
107
+ - **FFmpeg** installed for audio/video processing
108
+ - **Azure subscription** with the following services:
109
+ - Azure Speech Services
110
+ - Azure OpenAI Service
111
+ - Azure Blob Storage
112
+ - Azure Computer Vision (optional but recommended)
113
+
114
+ ### 1. Clone and Setup
115
+
116
+ ```bash
117
+ # Clone the repository
118
+ git clone <repository-url>
119
+ cd azure-ai-conference-service
120
+
121
+ # Create virtual environment
122
+ python -m venv venv
123
+ source venv/bin/activate # On Windows: venv\Scripts\activate
124
+
125
+ # Install dependencies
126
+ pip install -r requirements.txt
127
+ ```
128
+
129
+ ### 2. Configure Environment
130
+
131
+ ```bash
132
+ # Copy environment template
133
+ cp env_template.sh .env
134
+
135
+ # Edit .env file with your Azure credentials
136
+ nano .env
137
+ ```
138
+
139
+ **Required Configuration:**
140
+ - `AZURE_SPEECH_KEY` and `AZURE_SPEECH_KEY_ENDPOINT`
141
+ - `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_KEY`, and `AZURE_OPENAI_DEPLOYMENT`
142
+ - `AZURE_BLOB_CONNECTION`, `AZURE_CONTAINER`, and `AZURE_BLOB_SAS_TOKEN`
143
+ - `COMPUTER_VISION_ENDPOINT` and `COMPUTER_VISION_KEY` (optional)
144
+
145
+ ### 3. Run the Application
146
+
147
+ ```bash
148
+ # Start the service
149
+ python app.py
150
+ ```
151
+
152
+ The service will be available at `http://localhost:7860`
153
+
154
+ ## πŸ“ Project Structure
155
+
156
+ ```
157
+ azure-ai-conference-service/
158
+ β”œβ”€β”€ app.py # Main Gradio application
159
+ β”œβ”€β”€ app_core.py # Core backend logic and database
160
+ β”œβ”€β”€ ai_summary.py # AI summarization manager
161
+ β”œβ”€β”€ file_processors.py # Document processing utilities
162
+ β”œβ”€β”€ image_extraction.py # Video frame extraction
163
+ β”œβ”€β”€ requirements.txt # Python dependencies
164
+ β”œβ”€β”€ env_template.sh # Environment configuration template
165
+ β”œβ”€β”€ .env # Your configuration (create from template)
166
+ β”œβ”€β”€ database/ # SQLite database files
167
+ β”œβ”€β”€ uploads/ # Temporary upload processing
168
+ β”œβ”€β”€ temp/ # Temporary files and downloads
169
+ └── logs/ # Application logs
170
+ ```
171
+
172
+ ## πŸ”§ Configuration Guide
173
+
174
+ ### Azure Services Setup
175
+
176
+ #### 1. Azure Speech Services
177
+ ```bash
178
+ # Create Speech resource
179
+ az cognitiveservices account create \
180
+ --name "your-speech-service" \
181
+ --resource-group "your-rg" \
182
+ --kind "SpeechServices" \
183
+ --sku "S0" \
184
+ --location "your-region"
185
+ ```
186
+
187
+ #### 2. Azure OpenAI Service
188
+ ```bash
189
+ # Create OpenAI resource
190
+ az cognitiveservices account create \
191
+ --name "your-openai-service" \
192
+ --resource-group "your-rg" \
193
+ --kind "OpenAI" \
194
+ --sku "S0" \
195
+ --location "your-region"
196
+
197
+ # Deploy model
198
+ az cognitiveservices account deployment create \
199
+ --name "your-openai-service" \
200
+ --resource-group "your-rg" \
201
+ --deployment-name "gpt-4o-mini" \
202
+ --model-name "gpt-4o-mini" \
203
+ --model-version "2024-07-18"
204
+ ```
205
+
206
+ #### 3. Azure Blob Storage
207
+ ```bash
208
+ # Create storage account
209
+ az storage account create \
210
+ --name "yourstorageaccount" \
211
+ --resource-group "your-rg" \
212
+ --location "your-region" \
213
+ --sku "Standard_LRS"
214
+
215
+ # Create containers
216
+ az storage container create --name "transcripts" --account-name "yourstorageaccount"
217
+ az storage container create --name "transcripts-summaries" --account-name "yourstorageaccount"
218
+ az storage container create --name "transcripts-chats" --account-name "yourstorageaccount"
219
+ ```
220
+
221
+ ### Environment Variables Reference
222
+
223
+ | Variable | Description | Required |
224
+ |----------|-------------|----------|
225
+ | `AZURE_SPEECH_KEY` | Azure Speech Services API key | βœ… |
226
+ | `AZURE_SPEECH_KEY_ENDPOINT` | Speech Services endpoint URL | βœ… |
227
+ | `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint URL | βœ… |
228
+ | `AZURE_OPENAI_KEY` | Azure OpenAI API key | βœ… |
229
+ | `AZURE_OPENAI_DEPLOYMENT` | Model deployment name | βœ… |
230
+ | `AZURE_BLOB_CONNECTION` | Blob storage connection string | βœ… |
231
+ | `AZURE_CONTAINER` | Main blob container name | βœ… |
232
+ | `AZURE_BLOB_SAS_TOKEN` | SAS token for blob access | βœ… |
233
+ | `COMPUTER_VISION_ENDPOINT` | Computer Vision endpoint | ⚠️ |
234
+ | `COMPUTER_VISION_KEY` | Computer Vision API key | ⚠️ |
235
+
236
+ **Legend:** βœ… Required | ⚠️ Recommended
237
+
238
+ ## 🎯 Usage Examples
239
+
240
+ ### Basic Transcription
241
+ 1. **Register/Login** to the service
242
+ 2. **Upload** an audio or video file
243
+ 3. **Configure** language and speaker settings
244
+ 4. **Start transcription** and wait for auto-refresh
245
+ 5. **Download** the completed transcript
246
+
247
+ ### AI-Powered Summary
248
+ 1. **Choose content sources**: existing transcripts or new files
249
+ 2. **Provide AI instructions**: specify format and focus areas
250
+ 3. **Configure output**: language and format preferences
251
+ 4. **Generate summary** with multi-modal analysis
252
+ 5. **Download** comprehensive AI analysis
253
+
254
+ ### Batch Processing
255
+ - Upload multiple files simultaneously
256
+ - Process presentations, documents, and videos together
257
+ - Generate unified summaries across all content types
258
+
259
+ ## πŸ” Security Features
260
+
261
+ ### Authentication & Authorization
262
+ - **Secure user registration** with password strength validation
263
+ - **Session management** with proper logout functionality
264
+ - **User isolation** - users can only access their own data
265
+
266
+ ### Data Protection
267
+ - **User-separated blob storage** containers
268
+ - **Encrypted data transmission** over HTTPS
269
+ - **Audit logging** for all user actions
270
+ - **Automatic cleanup** of temporary files
271
+
272
+ ### GDPR Compliance
273
+ - **Complete data export** in JSON format
274
+ - **Right to be forgotten** with full account deletion
275
+ - **Granular consent management** for different data uses
276
+ - **Data retention policies** with automatic cleanup
277
+
278
+ ## πŸ“Š Performance Optimization
279
+
280
+ ### Processing Efficiency
281
+ - **Background workers** for parallel processing
282
+ - **Smart frame extraction** using computer vision
283
+ - **Token optimization** for AI model efficiency
284
+ - **Caching strategies** for frequently accessed data
285
+
286
+ ### Scalability
287
+ - **Horizontal scaling** support with load balancing
288
+ - **Resource limits** and rate limiting
289
+ - **Efficient database queries** with proper indexing
290
+ - **Auto-cleanup** of old data and temporary files
291
+
292
+ ## πŸ› οΈ Development
293
+
294
+ ### Local Development Setup
295
+
296
+ ```bash
297
+ # Install development dependencies
298
+ pip install -r requirements.txt
299
+
300
+ # Set development mode
301
+ export DEV_MODE=True
302
+
303
+ # Run with auto-reload
304
+ python app.py --reload
305
+ ```
306
+
307
+ ### Testing
308
+
309
+ ```bash
310
+ # Run basic tests
311
+ python -m pytest tests/
312
+
313
+ # Test Azure connections
314
+ python -c "from app_core import transcription_manager; print('βœ… Backend connected')"
315
+ python -c "from ai_summary import ai_summary_manager; print('βœ… AI service connected')"
316
+ ```
317
+
318
+ ### Adding New Features
319
+
320
+ 1. **Backend Logic**: Add to `app_core.py` or create new modules
321
+ 2. **AI Features**: Extend `ai_summary.py` with new capabilities
322
+ 3. **File Processing**: Add new formats to `file_processors.py`
323
+ 4. **UI Components**: Update `app.py` with new Gradio components
324
+ 5. **Database**: Add migrations to database schema as needed
325
+
326
+ ## πŸ“ˆ Monitoring & Troubleshooting
327
+
328
+ ### Logging
329
+ - **Application logs**: Check `logs/ai_conference_service.log`
330
+ - **Error tracking**: Monitor console output for errors
331
+ - **Performance metrics**: Track processing times and success rates
332
+
333
+ ### Common Issues
334
+
335
+ #### Connection Issues
336
+ ```bash
337
+ # Test Azure Speech
338
+ curl -H "Ocp-Apim-Subscription-Key: YOUR_KEY" \
339
+ "https://YOUR_REGION.api.cognitive.microsoft.com/sts/v1.0/issuetoken"
340
+
341
+ # Test Azure OpenAI
342
+ curl -H "api-key: YOUR_KEY" \
343
+ "https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_MODEL/chat/completions?api-version=2024-08-01-preview"
344
+ ```
345
+
346
+ #### File Processing Issues
347
+ - Ensure **FFmpeg** is installed and in PATH
348
+ - Check file format support in `file_processors.py`
349
+ - Verify file size limits (default: 500MB)
350
+
351
+ #### Database Issues
352
+ - Check database permissions for `database/` directory
353
+ - Verify blob storage connection for database backups
354
+ - Monitor disk space for database growth
355
+
356
+ ## 🚒 Production Deployment
357
+
358
+ ### Docker Deployment
359
+
360
+ ```dockerfile
361
+ FROM python:3.9-slim
362
+
363
+ WORKDIR /app
364
+
365
+ # Install system dependencies
366
+ RUN apt-get update && apt-get install -y \
367
+ ffmpeg \
368
+ libsm6 \
369
+ libxext6 \
370
+ libxrender-dev \
371
+ libglib2.0-0 \
372
+ && rm -rf /var/lib/apt/lists/*
373
+
374
+ COPY requirements.txt .
375
+ RUN pip install -r requirements.txt
376
+
377
+ COPY . .
378
+
379
+ EXPOSE 7860
380
+
381
+ CMD ["python", "app.py"]
382
+ ```
383
+
384
+ ### Azure Container Instance
385
+
386
+ ```bash
387
+ # Build and push image
388
+ docker build -t azure-ai-conference-service .
389
+ docker tag azure-ai-conference-service your-registry.azurecr.io/azure-ai-conference-service
390
+ docker push your-registry.azurecr.io/azure-ai-conference-service
391
+
392
+ # Deploy to Azure Container Instances
393
+ az container create \
394
+ --resource-group your-rg \
395
+ --name azure-ai-conference-service \
396
+ --image your-registry.azurecr.io/azure-ai-conference-service \
397
+ --ports 7860 \
398
+ --environment-variables \
399
+ AZURE_SPEECH_KEY=$AZURE_SPEECH_KEY \
400
+ AZURE_OPENAI_KEY=$AZURE_OPENAI_KEY \
401
+ # ... other environment variables
402
+ ```
403
+
404
+ ### Production Checklist
405
+
406
+ - [ ] **Security**: Change default passwords and salts
407
+ - [ ] **SSL/TLS**: Configure HTTPS certificates
408
+ - [ ] **Monitoring**: Set up Azure Application Insights
409
+ - [ ] **Backup**: Configure database and blob backup strategies
410
+ - [ ] **Scaling**: Configure auto-scaling policies
411
+ - [ ] **Compliance**: Review and configure GDPR settings
412
+
413
+ ## πŸ“š API Reference
414
+
415
+ ### Core Classes
416
+
417
+ #### `TranscriptionManager`
418
+ - `submit_transcription(file_bytes, filename, user_id, language, settings)`
419
+ - `get_job_status(job_id)`
420
+ - `get_user_history(user_id, limit)`
421
+
422
+ #### `AISummaryManager`
423
+ - `submit_summary_job(user_id, summary_type, user_prompt, files, settings)`
424
+ - `get_summary_status(job_id)`
425
+ - `get_user_summary_history(user_id, limit)`
426
+
427
+ #### `FileProcessor`
428
+ - `process_file(file_path, extension)`
429
+ - `batch_process_files(file_paths)`
430
+ - `get_file_info(file_path)`
431
+
432
+ ## 🀝 Contributing
433
+
434
+ We welcome contributions! Please see our contributing guidelines:
435
+
436
+ 1. **Fork** the repository
437
+ 2. **Create** a feature branch
438
+ 3. **Make** your changes with tests
439
+ 4. **Submit** a pull request
440
+
441
+ ### Development Standards
442
+ - **Code style**: Follow PEP 8 for Python code
443
+ - **Documentation**: Update README and docstrings
444
+ - **Testing**: Add tests for new features
445
+ - **Security**: Follow security best practices
446
+
447
+ ## πŸ“„ License
448
+
449
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
450
+
451
+ ## πŸ†˜ Support
452
+
453
+ ### Getting Help
454
+ - **Documentation**: Check this README and inline comments
455
+ - **Issues**: Create GitHub issues for bugs or feature requests
456
+ - **Azure Support**: Use Azure support for service-specific issues
457
+
458
+ ### Contact Information
459
+ - **Project maintainer**: [Your contact information]
460
+ - **Technical support**: [Support email]
461
+ - **Azure resources**: [Azure documentation links]
462
+
463
+ ---
464
+
465
+ ## πŸŽ‰ Acknowledgments
466
+
467
+ - **Azure AI Services** for powerful AI capabilities
468
+ - **Gradio** for the excellent web interface framework
469
+ - **OpenCV** for computer vision functionality
470
+ - **Contributors** and the open-source community
471
+
472
+ ---
473
+
474
+ **πŸš€ Ready to transform your conference analysis with AI? Get started today!**