Spaces:

Hexa09
/

bugget-whisperx-model

Sleeping

App Files Files Community

bugget-whisperx-model / README.md

Hexa06

Update to latest Gradio version (5.9.1)

ab6fcd2 4 months ago

preview code

raw

history blame contribute delete

1.93 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: Studify WhisperX Alignment
emoji: 🎯
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
app_port: 7860
pinned: false
license: mit

🎯 Studify WhisperX Word-Level Alignment

Industry-standard forced alignment service for perfect TTS word-level highlighting in the Studify app.

🚀 Features

WhisperX Large-v2 model for accurate transcription
Forced alignment for precise word-level timestamps
GPU accelerated (T4/A10G) for fast inference
REST API with FastAPI
Multi-language support

📡 API Endpoints

Health Check

GET /

Returns service status and device info.

Align Audio

POST /align
Content-Type: multipart/form-data

Parameters:
- audio_file: MP3/WAV file
- text: Original text (optional, improves accuracy)
- language: Language code (default: "en")

Response:
{
  "word_segments": [
    { "word": "Hello", "start": 0.11, "end": 0.48 },
    { "word": "world", "start": 0.48, "end": 0.90 }
  ],
  "duration": 1.2,
  "word_count": 2,
  "language": "en"
}

🔧 Usage Example

import requests

files = {'audio_file': open('audio.mp3', 'rb')}
data = {'language': 'en'}

response = requests.post(
    'https://hexa09-bugget-whisperx-model.hf.space/align',
    files=files,
    data=data
)

timings = response.json()
print(timings['word_segments'])

🎨 Built for Studify

This service powers perfect word-by-word highlighting synchronized with TTS audio in the Studify educational app.

📚 Technical Details

Model: WhisperX large-v2
Framework: FastAPI + Uvicorn
Device: CUDA GPU (when available) or CPU fallback
Precision: FP16 (GPU) or INT8 (CPU)

🔗 Integration

See the Studify TTS Alignment Architecture for full integration guide.

📄 License

MIT License - Free for commercial and personal use