--- license: cc-by-nc-4.0 language: - lb tags: - text-to-speech - tts - vits - coqui - luxembourgish library_name: coqui pipeline_tag: text-to-speech --- # Coqui TTS - Max (Luxembourgish Male Voice) A VITS-based text-to-speech model for Luxembourgish, featuring a natural male voice. ## Model Description This model was trained using the [Coqui TTS](https://github.com/coqui-ai/TTS) framework on Luxembourgish speech data from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu) example sentences. "Max" is a male Luxembourgish voice based on recordings from a real speaker. ### Model Details - **Architecture:** VITS - **Language:** Luxembourgish (lb) - **Speaker:** Single speaker (male) - **Sample Rate:** 22050 Hz - **Checkpoint:** 50,000 steps - **License:** CC BY-NC 4.0 (Non-commercial use only) ## License Notice **This model is for non-commercial use only.** All commercial uses are prohibited. The voice data is derived from recordings of a real speaker and may only be used freely for non-commercial purposes. ## Usage **Note:** Text should be lowercased before synthesis. Additional text normalization may be required. ```python import torch import scipy.io.wavfile as wavfile from TTS.utils.synthesizer import Synthesizer # Load the model synthesizer = Synthesizer( tts_checkpoint="path/to/coqui-tts-max.pth", tts_config_path="path/to/config.json", use_cuda=torch.cuda.is_available() ) # Generate speech wav = synthesizer.tts("moien, wéi geet et dir?") # Save to file wavfile.write("output.wav", 22050, wav) ``` ## Technical Specifications | Parameter | Value | |-----------|-------| | Hidden Channels | 192 | | Text Encoder Layers | 6 | | Posterior Encoder Layers | 16 | | Flow Layers | 4 | | Mel Channels | 80 | | FFT Size | 1024 | ## Citation If you use this model, please cite: ```bibtex @misc{zls2025coquimax, title={Coqui TTS Max - Luxembourgish Male Voice}, author={Zenter fir d'Lëtzebuerger Sprooch}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/ZLSCompLing/CoquiTTS-Max} } ``` ## Acknowledgments Originally trained by [Marco Barnig](https://huggingface.co/mbarnig). Now developed and maintained by [Zenter fir d'Lëtzebuerger Sprooch](https://zls.lu). Voice data sourced from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu). The original audio files are available via the [LOD linguistic data on data.public.lu](https://data.public.lu/en/datasets/letzebuerger-online-dictionnaire-lod-linguistesch-daten/), which provides an XML file containing example sentence IDs. Audio files can be accessed at: ``` https://lod.lu/uploads/examples/AAC/{folder}/{id}.m4a ``` where `{folder}` is the first 2 characters of `{id}`. This model is used in [Sproochmaschinn](https://sproochmaschinn.lu), a Luxembourgish speech processing platform.