Spaces:

Hexa09
/

bugget-whisperx-model

Sleeping

App Files Files Community

bugget-whisperx-model / README.md

Hexa06

Update to latest Gradio version (5.9.1)

ab6fcd2 4 months ago

preview code

raw

history blame contribute delete

1.93 kB

	---
	title: Studify WhisperX Alignment
	emoji: 🎯
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: "5.9.1"
	app_file: app.py
	app_port: 7860
	pinned: false
	license: mit
	---

	# 🎯 Studify WhisperX Word-Level Alignment

	Industry-standard forced alignment service for perfect TTS word-level highlighting in the Studify app.

	## 🚀 Features

	- WhisperX Large-v2 model for accurate transcription
	- Forced alignment for precise word-level timestamps
	- GPU accelerated (T4/A10G) for fast inference
	- REST API with FastAPI
	- Multi-language support

	## 📡 API Endpoints

	### Health Check
	```bash
	GET /
	```

	Returns service status and device info.

	### Align Audio
	```bash
	POST /align
	Content-Type: multipart/form-data

	Parameters:
	- audio_file: MP3/WAV file
	- text: Original text (optional, improves accuracy)
	- language: Language code (default: "en")

	Response:
	{
	"word_segments": [
	{ "word": "Hello", "start": 0.11, "end": 0.48 },
	{ "word": "world", "start": 0.48, "end": 0.90 }
	],
	"duration": 1.2,
	"word_count": 2,
	"language": "en"
	}
	```

	## 🔧 Usage Example

	```python
	import requests

	files = {'audio_file': open('audio.mp3', 'rb')}
	data = {'language': 'en'}

	response = requests.post(
	'https://hexa09-bugget-whisperx-model.hf.space/align',
	files=files,
	data=data
	)

	timings = response.json()
	print(timings['word_segments'])
	```

	## 🎨 Built for Studify

	This service powers perfect word-by-word highlighting synchronized with TTS audio in the Studify educational app.

	## 📚 Technical Details

	- Model: WhisperX large-v2
	- Framework: FastAPI + Uvicorn
	- Device: CUDA GPU (when available) or CPU fallback
	- Precision: FP16 (GPU) or INT8 (CPU)

	## 🔗 Integration

	See the [Studify TTS Alignment Architecture](https://github.com/Hexa-Innovate/Studify) for full integration guide.

	## 📄 License

	MIT License - Free for commercial and personal use