Spaces:

kruzer
/

audio_tts_explorer

Sleeping

App Files Files Community

audio_tts_explorer / README.md

kruzer

Add facebook/voxpopuli to dataset dropdown

77309b7 2 months ago

preview code

raw

history blame contribute delete

2.41 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

metadata

title: Audio Dataset Explorer for TTS
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
python_version: '3.10'
app_file: app.py
pinned: false
license: mit

🎙️ Audio Dataset Explorer for TTS

Interactive tool for exploring audio datasets, analyzing speakers, and selecting training data for TTS models.

Features

📊 Overview Statistics - Analyze all speakers in a dataset
🎯 Speaker Details - Deep dive into individual speaker statistics
📈 Interactive Charts - Duration distributions, word counts, sample distributions
📥 Export Instructions - Step-by-step guide to create your own filtered dataset fork
🔄 Multi-Dataset Support - Works with any HuggingFace audio dataset with speaker_id field

Usage

Load Dataset: Enter dataset name and config (e.g., ylacombe/cml-tts + polish)
Overview: Check statistics for all speakers
Select Speaker: Choose a speaker from the dropdown
Analyze: View detailed statistics and audio samples
Export: Get instructions to create your own filtered dataset

Supported Datasets

The tool works with any HuggingFace dataset that has:

Audio data
speaker_id field
duration and text fields (optional but recommended)

Tested Datasets

ylacombe/cml-tts - Multi-lingual TTS (Dutch, French, German, Italian, Polish, Portuguese, Spanish)
facebook/voxpopuli - European Parliament speeches
mozilla-foundation/common_voice_* - Community-contributed voices

Why This Tool?

When training TTS models, you often want to:

Select a single speaker for consistency
Understand data distribution before training
Create filtered subsets for experiments
Add custom columns (emotion, quality scores, etc.)

This tool helps you make informed decisions about your training data.

Creating Your Own Dataset Fork

After selecting a speaker, use the "Pobierz & Fork" tab to get instructions for:

Downloading the full dataset
Filtering to your chosen speaker
Adding custom columns
Pushing to HuggingFace Hub as a new dataset

Local Development

# Install dependencies
pip install -r requirements.txt

# Run locally
python app.py

Credits

Built for the TTS training workflow. Designed to work with HuggingFace Datasets ecosystem.

License

MIT License - Feel free to use and modify for your projects!