CoquiTTS-Maxine / README.md
ZLSCompLing's picture
Upload README.md with huggingface_hub
4dd70da verified
---
license: mit
language:
- lb
tags:
- text-to-speech
- tts
- vits
- coqui
- luxembourgish
library_name: coqui
pipeline_tag: text-to-speech
---
# Coqui TTS - Maxine (Luxembourgish Female Voice)
A VITS-based text-to-speech model for Luxembourgish, featuring a synthetic female voice.
## Model Description
This model was trained using the [Coqui TTS](https://github.com/coqui-ai/TTS) framework on Luxembourgish speech data from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu) example sentences.
"Maxine" is a synthetic female Luxembourgish voice created by modulating the original LOD recordings to produce a distinct female voice character.
### Model Details
- **Architecture:** VITS
- **Language:** Luxembourgish (lb)
- **Speaker:** Single speaker (female, synthetic)
- **Sample Rate:** 22050 Hz
- **Checkpoint:** ~90,000 steps
- **License:** MIT
## Usage
**Note:** Text should be lowercased before synthesis. Additional text normalization may be required.
```python
import torch
import scipy.io.wavfile as wavfile
from TTS.utils.synthesizer import Synthesizer
# Load the model
synthesizer = Synthesizer(
tts_checkpoint="path/to/coqui-tts-maxine.pth",
tts_config_path="path/to/config.json",
use_cuda=torch.cuda.is_available()
)
# Generate speech
wav = synthesizer.tts("moien, wéi geet et dir?")
# Save to file
wavfile.write("output.wav", 22050, wav)
```
## Technical Specifications
| Parameter | Value |
|-----------|-------|
| Hidden Channels | 192 |
| Text Encoder Layers | 6 |
| Posterior Encoder Layers | 16 |
| Flow Layers | 4 |
| Mel Channels | 80 |
| FFT Size | 1024 |
## Citation
If you use this model, please cite:
```bibtex
@misc{zls2025coquimaxine,
title={Coqui TTS Maxine - Luxembourgish Female Voice},
author={Zenter fir d'Lëtzebuerger Sprooch},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/ZLSCompLing/CoquiTTS-Maxine}
}
```
## Acknowledgments
Developed by [Zenter fir d'Lëtzebuerger Sprooch](https://zls.lu).
Voice data sourced from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu). The original audio files are available via the [LOD linguistic data on data.public.lu](https://data.public.lu/en/datasets/letzebuerger-online-dictionnaire-lod-linguistesch-daten/), which provides an XML file containing example sentence IDs. Audio files can be accessed at:
```
https://lod.lu/uploads/examples/AAC/{folder}/{id}.m4a
```
where `{folder}` is the first 2 characters of `{id}`.
This model is used in [Sproochmaschinn](https://sproochmaschinn.lu), a Luxembourgish speech processing platform.