maider-tts / README.md
urtzai's picture
Update README.md
79fbc11 verified
---
license: apache-2.0
language:
- eu
base_model:
- HiTZ/TTS-eu_maider
pipeline_tag: text-to-speech
tags:
- TTS
- speech-synthesis
- Basque
- piper
datasets:
- itzune/maider-dataset
---
# Basque TTS: Maider (Piper Version)
This repository contains a [Piper](https://github.com/OHF-Voice/piper1-gpl) compatible version of the **Maider** Basque text-to-speech model. The original model was developed by **HiTZ Basque Center for Language Technology - Aholab Signal Processing Laboratory** (University of the Basque Country UPV/EHU).
This version has been exported/trained specifically for use with the Piper TTS engine, a fast, local neural text-to-speech engine.
## Model Details
- **Language:** Basque (eu)
- **Speaker:** Maider (Female)
- **Architecture:** VITS (Optimized for Piper)
- **Original Credits:** HiTZ Center / Aholab (Project ILENIA)
- **Format:** Piper (`.onnx` and `.onnx.json` config)
## Training Details
- **Dataset:** [itzune/maider-dataset](https://huggingface.co/datasets/itzune/maider-dataset)
- **Data Volume:** 99,996 high-quality audio samples (~100k files).
- **Architecture:** VITS
- **Training Engine:** Piper (PyTorch Lightning)
- **Iterations:** 22 epochs (258,750 steps)
- **Sample Rate:** 22050 Hz
- **Phonemization:** espeak-ng (Basque)
## Data Source & Dataset Integration
This model has been fine-tuned/trained using the **maider_dataset**, a large-scale Basque speech corpus specifically curated for high-fidelity synthesis.
- **Link to Dataset:** [Maider Dataset on Hugging Face](https://huggingface.co/datasets/itzune/maider-dataset)
- **Dataset structure:** The data was processed using WebDataset (sharded .tar files) to handle the 100k samples efficiently after the Piper training process.
- **Content:** Each audio file is paired with its corresponding Basque transcription in a `metadata.csv` file, ensuring precise alignment during the 22 epochs of training.
## Files Included
* `eu-maider-medium.onnx`: The exported model for fast inference.
* `eu-maider-medium.onnx.json`: The configuration file (includes phoneme map and synthesis settings).
* `epoch=22-step=258750.ckpt`: The PyTorch Lightning checkpoint from the 22nd iteration (useful for further training/fine-tuning).
## Usage
### Using Piper CLI
You can run the model locally using the Piper binary:
```bash
echo "Kaixo, hau Maider da, Piper motorra erabiliz euskaraz hitz egiten." | \
./piper --model eu-maider-medium.onnx --output_file output.wav
```
### Python API
```Python
from piper.voice import PiperVoice
voice = PiperVoice.load("eu-maider-medium.onnx", "eu-maider-medium.onnx.json")
with open("output.wav", "wb") as f:
voice.synthesize_wav("Gaur egun eguzkitsua dugu.", f)
```
## Original Model & Data Source
The base model belongs to the Aholab TTS collection. All voices in this collection are based on the VITS architecture proposed by Kim et al. (2021).
Maider & Antton: Developed by HiTZ with funding from Project ILENIA.
License: Public Creative Commons Attribution 4.0 (for the voice resource) and Apache License 2.0 (for the code/model).
## Authors & Credits
The original Maider model was created by:
HiTZ Basque Center for Language Technology - Aholab Signal Processing Laboratory, University of the Basque Country EHU.