--- title: Studify WhisperX Alignment emoji: 🎯 colorFrom: blue colorTo: green sdk: gradio sdk_version: "5.9.1" app_file: app.py app_port: 7860 pinned: false license: mit --- # 🎯 Studify WhisperX Word-Level Alignment Industry-standard forced alignment service for perfect TTS word-level highlighting in the Studify app. ## 🚀 Features - **WhisperX Large-v2** model for accurate transcription - **Forced alignment** for precise word-level timestamps - **GPU accelerated** (T4/A10G) for fast inference - **REST API** with FastAPI - **Multi-language support** ## 📡 API Endpoints ### Health Check ```bash GET / ``` Returns service status and device info. ### Align Audio ```bash POST /align Content-Type: multipart/form-data Parameters: - audio_file: MP3/WAV file - text: Original text (optional, improves accuracy) - language: Language code (default: "en") Response: { "word_segments": [ { "word": "Hello", "start": 0.11, "end": 0.48 }, { "word": "world", "start": 0.48, "end": 0.90 } ], "duration": 1.2, "word_count": 2, "language": "en" } ``` ## 🔧 Usage Example ```python import requests files = {'audio_file': open('audio.mp3', 'rb')} data = {'language': 'en'} response = requests.post( 'https://hexa09-bugget-whisperx-model.hf.space/align', files=files, data=data ) timings = response.json() print(timings['word_segments']) ``` ## 🎨 Built for Studify This service powers perfect word-by-word highlighting synchronized with TTS audio in the Studify educational app. ## 📚 Technical Details - **Model**: WhisperX large-v2 - **Framework**: FastAPI + Uvicorn - **Device**: CUDA GPU (when available) or CPU fallback - **Precision**: FP16 (GPU) or INT8 (CPU) ## 🔗 Integration See the [Studify TTS Alignment Architecture](https://github.com/Hexa-Innovate/Studify) for full integration guide. ## 📄 License MIT License - Free for commercial and personal use