itzune
/

maider-tts

speech-synthesis

Model card Files Files and versions

maider-tts / README.md

urtzai's picture

Update README.md

79fbc11 verified about 1 month ago

|

history blame contribute delete

3.28 kB

	---
	license: apache-2.0
	language:
	- eu
	base_model:
	- HiTZ/TTS-eu_maider
	pipeline_tag: text-to-speech
	tags:
	- TTS
	- speech-synthesis
	- Basque
	- piper
	datasets:
	- itzune/maider-dataset
	---

	# Basque TTS: Maider (Piper Version)

	This repository contains a [Piper](https://github.com/OHF-Voice/piper1-gpl) compatible version of the Maider Basque text-to-speech model. The original model was developed by HiTZ Basque Center for Language Technology - Aholab Signal Processing Laboratory (University of the Basque Country UPV/EHU).

	This version has been exported/trained specifically for use with the Piper TTS engine, a fast, local neural text-to-speech engine.

	## Model Details

	- Language: Basque (eu)
	- Speaker: Maider (Female)
	- Architecture: VITS (Optimized for Piper)
	- Original Credits: HiTZ Center / Aholab (Project ILENIA)
	- Format: Piper (`.onnx` and `.onnx.json` config)

	## Training Details

	- Dataset: [itzune/maider-dataset](https://huggingface.co/datasets/itzune/maider-dataset)
	- Data Volume: 99,996 high-quality audio samples (~100k files).
	- Architecture: VITS
	- Training Engine: Piper (PyTorch Lightning)
	- Iterations: 22 epochs (258,750 steps)
	- Sample Rate: 22050 Hz
	- Phonemization: espeak-ng (Basque)

	## Data Source & Dataset Integration

	This model has been fine-tuned/trained using the maider_dataset, a large-scale Basque speech corpus specifically curated for high-fidelity synthesis.

	- Link to Dataset: [Maider Dataset on Hugging Face](https://huggingface.co/datasets/itzune/maider-dataset)
	- Dataset structure: The data was processed using WebDataset (sharded .tar files) to handle the 100k samples efficiently after the Piper training process.
	- Content: Each audio file is paired with its corresponding Basque transcription in a `metadata.csv` file, ensuring precise alignment during the 22 epochs of training.

	## Files Included
	* `eu-maider-medium.onnx`: The exported model for fast inference.
	* `eu-maider-medium.onnx.json`: The configuration file (includes phoneme map and synthesis settings).
	* `epoch=22-step=258750.ckpt`: The PyTorch Lightning checkpoint from the 22nd iteration (useful for further training/fine-tuning).

	## Usage

	### Using Piper CLI
	You can run the model locally using the Piper binary:

	```bash
	echo "Kaixo, hau Maider da, Piper motorra erabiliz euskaraz hitz egiten." \| \
	./piper --model eu-maider-medium.onnx --output_file output.wav
	```
	### Python API

	```Python
	from piper.voice import PiperVoice

	voice = PiperVoice.load("eu-maider-medium.onnx", "eu-maider-medium.onnx.json")
	with open("output.wav", "wb") as f:
	voice.synthesize_wav("Gaur egun eguzkitsua dugu.", f)
	```

	## Original Model & Data Source

	The base model belongs to the Aholab TTS collection. All voices in this collection are based on the VITS architecture proposed by Kim et al. (2021).

	Maider & Antton: Developed by HiTZ with funding from Project ILENIA.

	License: Public Creative Commons Attribution 4.0 (for the voice resource) and Apache License 2.0 (for the code/model).

	## Authors & Credits

	The original Maider model was created by:
	HiTZ Basque Center for Language Technology - Aholab Signal Processing Laboratory, University of the Basque Country EHU.