File size: 2,857 Bytes
596eab1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b700f0
596eab1
 
 
 
 
 
 
 
d42e822
6866d41
596eab1
 
 
 
 
 
 
 
 
 
 
 
 
b44cb01
596eab1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: cc-by-nc-4.0
language:
  - lb
tags:
  - text-to-speech
  - tts
  - vits
  - coqui
  - luxembourgish
library_name: coqui
pipeline_tag: text-to-speech
---

# Coqui TTS - Max (Luxembourgish Male Voice)

A VITS-based text-to-speech model for Luxembourgish, featuring a natural male voice.

## Model Description

This model was trained using the [Coqui TTS](https://github.com/coqui-ai/TTS) framework on Luxembourgish speech data from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu) example sentences.

"Max" is a male Luxembourgish voice based on recordings from a real speaker.

### Model Details

- **Architecture:** VITS
- **Language:** Luxembourgish (lb)
- **Speaker:** Single speaker (male)
- **Sample Rate:** 22050 Hz
- **Checkpoint:** 50,000 steps
- **License:** CC BY-NC 4.0 (Non-commercial use only)

## License Notice

**This model is for non-commercial use only.** All commercial uses are prohibited. The voice data is derived from recordings of a real speaker and may only be used freely for non-commercial purposes.

## Usage

**Note:** Text should be lowercased before synthesis. Additional text normalization may be required.

```python
import torch
import scipy.io.wavfile as wavfile
from TTS.utils.synthesizer import Synthesizer

# Load the model
synthesizer = Synthesizer(
    tts_checkpoint="path/to/coqui-tts-max.pth",
    tts_config_path="path/to/config.json",
    use_cuda=torch.cuda.is_available()
)

# Generate speech
wav = synthesizer.tts("moien, wéi geet et dir?")

# Save to file
wavfile.write("output.wav", 22050, wav)
```

## Technical Specifications

| Parameter | Value |
|-----------|-------|
| Hidden Channels | 192 |
| Text Encoder Layers | 6 |
| Posterior Encoder Layers | 16 |
| Flow Layers | 4 |
| Mel Channels | 80 |
| FFT Size | 1024 |

## Citation

If you use this model, please cite:

```bibtex
@misc{zls2025coquimax,
  title={Coqui TTS Max - Luxembourgish Male Voice},
  author={Zenter fir d'Lëtzebuerger Sprooch},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/ZLSCompLing/CoquiTTS-Max}
}
```

## Acknowledgments

Originally trained by [Marco Barnig](https://huggingface.co/mbarnig). Now developed and maintained by [Zenter fir d'Lëtzebuerger Sprooch](https://zls.lu).

Voice data sourced from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu). The original audio files are available via the [LOD linguistic data on data.public.lu](https://data.public.lu/en/datasets/letzebuerger-online-dictionnaire-lod-linguistesch-daten/), which provides an XML file containing example sentence IDs. Audio files can be accessed at:

```
https://lod.lu/uploads/examples/AAC/{folder}/{id}.m4a
```

where `{folder}` is the first 2 characters of `{id}`.

This model is used in [Sproochmaschinn](https://sproochmaschinn.lu), a Luxembourgish speech processing platform.