Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
metadata
title: Studify WhisperX Alignment
emoji: π―
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
app_port: 7860
pinned: false
license: mit
π― Studify WhisperX Word-Level Alignment
Industry-standard forced alignment service for perfect TTS word-level highlighting in the Studify app.
π Features
- WhisperX Large-v2 model for accurate transcription
- Forced alignment for precise word-level timestamps
- GPU accelerated (T4/A10G) for fast inference
- REST API with FastAPI
- Multi-language support
π‘ API Endpoints
Health Check
GET /
Returns service status and device info.
Align Audio
POST /align
Content-Type: multipart/form-data
Parameters:
- audio_file: MP3/WAV file
- text: Original text (optional, improves accuracy)
- language: Language code (default: "en")
Response:
{
"word_segments": [
{ "word": "Hello", "start": 0.11, "end": 0.48 },
{ "word": "world", "start": 0.48, "end": 0.90 }
],
"duration": 1.2,
"word_count": 2,
"language": "en"
}
π§ Usage Example
import requests
files = {'audio_file': open('audio.mp3', 'rb')}
data = {'language': 'en'}
response = requests.post(
'https://hexa09-bugget-whisperx-model.hf.space/align',
files=files,
data=data
)
timings = response.json()
print(timings['word_segments'])
π¨ Built for Studify
This service powers perfect word-by-word highlighting synchronized with TTS audio in the Studify educational app.
π Technical Details
- Model: WhisperX large-v2
- Framework: FastAPI + Uvicorn
- Device: CUDA GPU (when available) or CPU fallback
- Precision: FP16 (GPU) or INT8 (CPU)
π Integration
See the Studify TTS Alignment Architecture for full integration guide.
π License
MIT License - Free for commercial and personal use