|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- lb |
|
|
tags: |
|
|
- text-to-speech |
|
|
- tts |
|
|
- vits |
|
|
- coqui |
|
|
- luxembourgish |
|
|
library_name: coqui |
|
|
pipeline_tag: text-to-speech |
|
|
--- |
|
|
|
|
|
# Coqui TTS - Maxine (Luxembourgish Female Voice) |
|
|
|
|
|
A VITS-based text-to-speech model for Luxembourgish, featuring a synthetic female voice. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model was trained using the [Coqui TTS](https://github.com/coqui-ai/TTS) framework on Luxembourgish speech data from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu) example sentences. |
|
|
|
|
|
"Maxine" is a synthetic female Luxembourgish voice created by modulating the original LOD recordings to produce a distinct female voice character. |
|
|
|
|
|
### Model Details |
|
|
|
|
|
- **Architecture:** VITS |
|
|
- **Language:** Luxembourgish (lb) |
|
|
- **Speaker:** Single speaker (female, synthetic) |
|
|
- **Sample Rate:** 22050 Hz |
|
|
- **Checkpoint:** ~90,000 steps |
|
|
- **License:** MIT |
|
|
|
|
|
## Usage |
|
|
|
|
|
**Note:** Text should be lowercased before synthesis. Additional text normalization may be required. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import scipy.io.wavfile as wavfile |
|
|
from TTS.utils.synthesizer import Synthesizer |
|
|
|
|
|
# Load the model |
|
|
synthesizer = Synthesizer( |
|
|
tts_checkpoint="path/to/coqui-tts-maxine.pth", |
|
|
tts_config_path="path/to/config.json", |
|
|
use_cuda=torch.cuda.is_available() |
|
|
) |
|
|
|
|
|
# Generate speech |
|
|
wav = synthesizer.tts("moien, wéi geet et dir?") |
|
|
|
|
|
# Save to file |
|
|
wavfile.write("output.wav", 22050, wav) |
|
|
``` |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Hidden Channels | 192 | |
|
|
| Text Encoder Layers | 6 | |
|
|
| Posterior Encoder Layers | 16 | |
|
|
| Flow Layers | 4 | |
|
|
| Mel Channels | 80 | |
|
|
| FFT Size | 1024 | |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{zls2025coquimaxine, |
|
|
title={Coqui TTS Maxine - Luxembourgish Female Voice}, |
|
|
author={Zenter fir d'Lëtzebuerger Sprooch}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
url={https://huggingface.co/ZLSCompLing/CoquiTTS-Maxine} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Developed by [Zenter fir d'Lëtzebuerger Sprooch](https://zls.lu). |
|
|
|
|
|
Voice data sourced from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu). The original audio files are available via the [LOD linguistic data on data.public.lu](https://data.public.lu/en/datasets/letzebuerger-online-dictionnaire-lod-linguistesch-daten/), which provides an XML file containing example sentence IDs. Audio files can be accessed at: |
|
|
|
|
|
``` |
|
|
https://lod.lu/uploads/examples/AAC/{folder}/{id}.m4a |
|
|
``` |
|
|
|
|
|
where `{folder}` is the first 2 characters of `{id}`. |
|
|
|
|
|
This model is used in [Sproochmaschinn](https://sproochmaschinn.lu), a Luxembourgish speech processing platform. |
|
|
|