audio_tts_explorer / README.md
kruzer
Add facebook/voxpopuli to dataset dropdown
77309b7

A newer version of the Gradio SDK is available: 6.12.0

Upgrade
metadata
title: Audio Dataset Explorer for TTS
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
python_version: '3.10'
app_file: app.py
pinned: false
license: mit

πŸŽ™οΈ Audio Dataset Explorer for TTS

Interactive tool for exploring audio datasets, analyzing speakers, and selecting training data for TTS models.

Features

  • πŸ“Š Overview Statistics - Analyze all speakers in a dataset
  • 🎯 Speaker Details - Deep dive into individual speaker statistics
  • πŸ“ˆ Interactive Charts - Duration distributions, word counts, sample distributions
  • πŸ“₯ Export Instructions - Step-by-step guide to create your own filtered dataset fork
  • πŸ”„ Multi-Dataset Support - Works with any HuggingFace audio dataset with speaker_id field

Usage

  1. Load Dataset: Enter dataset name and config (e.g., ylacombe/cml-tts + polish)
  2. Overview: Check statistics for all speakers
  3. Select Speaker: Choose a speaker from the dropdown
  4. Analyze: View detailed statistics and audio samples
  5. Export: Get instructions to create your own filtered dataset

Supported Datasets

The tool works with any HuggingFace dataset that has:

  • Audio data
  • speaker_id field
  • duration and text fields (optional but recommended)

Tested Datasets

  • ylacombe/cml-tts - Multi-lingual TTS (Dutch, French, German, Italian, Polish, Portuguese, Spanish)
  • facebook/voxpopuli - European Parliament speeches
  • mozilla-foundation/common_voice_* - Community-contributed voices

Why This Tool?

When training TTS models, you often want to:

  • Select a single speaker for consistency
  • Understand data distribution before training
  • Create filtered subsets for experiments
  • Add custom columns (emotion, quality scores, etc.)

This tool helps you make informed decisions about your training data.

Creating Your Own Dataset Fork

After selecting a speaker, use the "Pobierz & Fork" tab to get instructions for:

  1. Downloading the full dataset
  2. Filtering to your chosen speaker
  3. Adding custom columns
  4. Pushing to HuggingFace Hub as a new dataset

Local Development

# Install dependencies
pip install -r requirements.txt

# Run locally
python app.py

Credits

Built for the TTS training workflow. Designed to work with HuggingFace Datasets ecosystem.

License

MIT License - Feel free to use and modify for your projects!