File size: 3,087 Bytes

---
license: mit
language:
  - lb
tags:
  - text-to-speech
  - tts
  - vits2
  - luxembourgish
pipeline_tag: text-to-speech
---

# VITS2 - Claude (Luxembourgish Gender-Neutral Voice)

A VITS2-based text-to-speech model for Luxembourgish, featuring a synthetic gender-neutral voice.

## Model Description

This model was trained using the VITS2 architecture on Luxembourgish speech data from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu) example sentences.

"Claude" is a synthetic gender-neutral Luxembourgish voice created by modulating the original LOD recordings.

### Model Details

- **Architecture:** VITS2 with duration discriminator and transformer flows
- **Language:** Luxembourgish (lb)
- **Speaker:** Single speaker (gender-neutral, synthetic)
- **Sample Rate:** 24000 Hz
- **Checkpoint:** G_57000 (57,000 steps)
- **License:** MIT

## Usage

**Note:** Text should be lowercased before synthesis. Additional text normalization may be required.

This model requires the included Python source files for inference.

### Basic Usage

```python
import torch
import scipy.io.wavfile as wavfile
from vits2_engine import VITS2Engine

# Load the model
engine = VITS2Engine(model_dir="path/to/vits2-claude")

# Generate speech
wav = engine.tts("moien, wéi geet et dir?")

# Save to file
wavfile.write("output.wav", engine.sample_rate, wav)
```

### Command Line

```bash
python inference.py "moien, wéi geet et dir?"

# With custom parameters
python inference.py "Text" --noise_scale 0.5 --length_scale 1.1 -o output.wav
```

### Parameters

- `noise_scale`: Controls voice variation (default: 0.667, lower = more consistent)
- `noise_scale_w`: Controls duration variation (default: 0.8)
- `length_scale`: Controls speech speed (default: 1.0, higher = slower)

## Technical Specifications

| Parameter | Value |
|-----------|-------|
| Hidden Channels | 192 |
| Filter Channels | 768 |
| Attention Heads | 2 |
| Encoder Layers | 6 |
| Mel Channels | 80 |
| FFT Size | 1024 |
| Hop Length | 256 |

## Requirements

- Python 3.8+
- PyTorch
- scipy
- numpy
- Cython (for monotonic_align)

## Citation

If you use this model, please cite:

```bibtex
@misc{zls2025vits2claude,
  title={VITS2 Claude - Luxembourgish Gender-Neutral Voice},
  author={Zenter fir d'Lëtzebuerger Sprooch},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/ZLSCompLing/VITS2-Claude}
}
```

## Acknowledgments

Developed by [Zenter fir d'Lëtzebuerger Sprooch](https://zls.lu).

Voice data sourced from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu). The original audio files are available via the [LOD linguistic data on data.public.lu](https://data.public.lu/en/datasets/letzebuerger-online-dictionnaire-lod-linguistesch-daten/), which provides an XML file containing example sentence IDs. Audio files can be accessed at:

```
https://lod.lu/uploads/examples/AAC/{folder}/{id}.m4a
```

where `{folder}` is the first 2 characters of `{id}`.

This model is used in [Sproochmaschinn](https://sproochmaschinn.lu), a Luxembourgish speech processing platform.