Spaces:
Sleeping
Sleeping
| title: Studify WhisperX Alignment | |
| emoji: π― | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: "5.9.1" | |
| app_file: app.py | |
| app_port: 7860 | |
| pinned: false | |
| license: mit | |
| # π― Studify WhisperX Word-Level Alignment | |
| Industry-standard forced alignment service for perfect TTS word-level highlighting in the Studify app. | |
| ## π Features | |
| - **WhisperX Large-v2** model for accurate transcription | |
| - **Forced alignment** for precise word-level timestamps | |
| - **GPU accelerated** (T4/A10G) for fast inference | |
| - **REST API** with FastAPI | |
| - **Multi-language support** | |
| ## π‘ API Endpoints | |
| ### Health Check | |
| ```bash | |
| GET / | |
| ``` | |
| Returns service status and device info. | |
| ### Align Audio | |
| ```bash | |
| POST /align | |
| Content-Type: multipart/form-data | |
| Parameters: | |
| - audio_file: MP3/WAV file | |
| - text: Original text (optional, improves accuracy) | |
| - language: Language code (default: "en") | |
| Response: | |
| { | |
| "word_segments": [ | |
| { "word": "Hello", "start": 0.11, "end": 0.48 }, | |
| { "word": "world", "start": 0.48, "end": 0.90 } | |
| ], | |
| "duration": 1.2, | |
| "word_count": 2, | |
| "language": "en" | |
| } | |
| ``` | |
| ## π§ Usage Example | |
| ```python | |
| import requests | |
| files = {'audio_file': open('audio.mp3', 'rb')} | |
| data = {'language': 'en'} | |
| response = requests.post( | |
| 'https://hexa09-bugget-whisperx-model.hf.space/align', | |
| files=files, | |
| data=data | |
| ) | |
| timings = response.json() | |
| print(timings['word_segments']) | |
| ``` | |
| ## π¨ Built for Studify | |
| This service powers perfect word-by-word highlighting synchronized with TTS audio in the Studify educational app. | |
| ## π Technical Details | |
| - **Model**: WhisperX large-v2 | |
| - **Framework**: FastAPI + Uvicorn | |
| - **Device**: CUDA GPU (when available) or CPU fallback | |
| - **Precision**: FP16 (GPU) or INT8 (CPU) | |
| ## π Integration | |
| See the [Studify TTS Alignment Architecture](https://github.com/Hexa-Innovate/Studify) for full integration guide. | |
| ## π License | |
| MIT License - Free for commercial and personal use | |