|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- lb |
|
|
tags: |
|
|
- text-to-speech |
|
|
- tts |
|
|
- vits2 |
|
|
- luxembourgish |
|
|
pipeline_tag: text-to-speech |
|
|
--- |
|
|
|
|
|
# VITS2 - Claude (Luxembourgish Gender-Neutral Voice) |
|
|
|
|
|
A VITS2-based text-to-speech model for Luxembourgish, featuring a synthetic gender-neutral voice. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model was trained using the VITS2 architecture on Luxembourgish speech data from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu) example sentences. |
|
|
|
|
|
"Claude" is a synthetic gender-neutral Luxembourgish voice created by modulating the original LOD recordings. |
|
|
|
|
|
### Model Details |
|
|
|
|
|
- **Architecture:** VITS2 with duration discriminator and transformer flows |
|
|
- **Language:** Luxembourgish (lb) |
|
|
- **Speaker:** Single speaker (gender-neutral, synthetic) |
|
|
- **Sample Rate:** 24000 Hz |
|
|
- **Checkpoint:** G_57000 (57,000 steps) |
|
|
- **License:** MIT |
|
|
|
|
|
## Usage |
|
|
|
|
|
**Note:** Text should be lowercased before synthesis. Additional text normalization may be required. |
|
|
|
|
|
This model requires the included Python source files for inference. |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import scipy.io.wavfile as wavfile |
|
|
from vits2_engine import VITS2Engine |
|
|
|
|
|
# Load the model |
|
|
engine = VITS2Engine(model_dir="path/to/vits2-claude") |
|
|
|
|
|
# Generate speech |
|
|
wav = engine.tts("moien, wéi geet et dir?") |
|
|
|
|
|
# Save to file |
|
|
wavfile.write("output.wav", engine.sample_rate, wav) |
|
|
``` |
|
|
|
|
|
### Command Line |
|
|
|
|
|
```bash |
|
|
python inference.py "moien, wéi geet et dir?" |
|
|
|
|
|
# With custom parameters |
|
|
python inference.py "Text" --noise_scale 0.5 --length_scale 1.1 -o output.wav |
|
|
``` |
|
|
|
|
|
### Parameters |
|
|
|
|
|
- `noise_scale`: Controls voice variation (default: 0.667, lower = more consistent) |
|
|
- `noise_scale_w`: Controls duration variation (default: 0.8) |
|
|
- `length_scale`: Controls speech speed (default: 1.0, higher = slower) |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Hidden Channels | 192 | |
|
|
| Filter Channels | 768 | |
|
|
| Attention Heads | 2 | |
|
|
| Encoder Layers | 6 | |
|
|
| Mel Channels | 80 | |
|
|
| FFT Size | 1024 | |
|
|
| Hop Length | 256 | |
|
|
|
|
|
## Requirements |
|
|
|
|
|
- Python 3.8+ |
|
|
- PyTorch |
|
|
- scipy |
|
|
- numpy |
|
|
- Cython (for monotonic_align) |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{zls2025vits2claude, |
|
|
title={VITS2 Claude - Luxembourgish Gender-Neutral Voice}, |
|
|
author={Zenter fir d'Lëtzebuerger Sprooch}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
url={https://huggingface.co/ZLSCompLing/VITS2-Claude} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Developed by [Zenter fir d'Lëtzebuerger Sprooch](https://zls.lu). |
|
|
|
|
|
Voice data sourced from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu). The original audio files are available via the [LOD linguistic data on data.public.lu](https://data.public.lu/en/datasets/letzebuerger-online-dictionnaire-lod-linguistesch-daten/), which provides an XML file containing example sentence IDs. Audio files can be accessed at: |
|
|
|
|
|
``` |
|
|
https://lod.lu/uploads/examples/AAC/{folder}/{id}.m4a |
|
|
``` |
|
|
|
|
|
where `{folder}` is the first 2 characters of `{id}`. |
|
|
|
|
|
This model is used in [Sproochmaschinn](https://sproochmaschinn.lu), a Luxembourgish speech processing platform. |
|
|
|