--- license: mit language: - lb tags: - text-to-speech - tts - vits2 - luxembourgish pipeline_tag: text-to-speech --- # VITS2 - Claude (Luxembourgish Gender-Neutral Voice) A VITS2-based text-to-speech model for Luxembourgish, featuring a synthetic gender-neutral voice. ## Model Description This model was trained using the VITS2 architecture on Luxembourgish speech data from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu) example sentences. "Claude" is a synthetic gender-neutral Luxembourgish voice created by modulating the original LOD recordings. ### Model Details - **Architecture:** VITS2 with duration discriminator and transformer flows - **Language:** Luxembourgish (lb) - **Speaker:** Single speaker (gender-neutral, synthetic) - **Sample Rate:** 24000 Hz - **Checkpoint:** G_57000 (57,000 steps) - **License:** MIT ## Usage **Note:** Text should be lowercased before synthesis. Additional text normalization may be required. This model requires the included Python source files for inference. ### Basic Usage ```python import torch import scipy.io.wavfile as wavfile from vits2_engine import VITS2Engine # Load the model engine = VITS2Engine(model_dir="path/to/vits2-claude") # Generate speech wav = engine.tts("moien, wéi geet et dir?") # Save to file wavfile.write("output.wav", engine.sample_rate, wav) ``` ### Command Line ```bash python inference.py "moien, wéi geet et dir?" # With custom parameters python inference.py "Text" --noise_scale 0.5 --length_scale 1.1 -o output.wav ``` ### Parameters - `noise_scale`: Controls voice variation (default: 0.667, lower = more consistent) - `noise_scale_w`: Controls duration variation (default: 0.8) - `length_scale`: Controls speech speed (default: 1.0, higher = slower) ## Technical Specifications | Parameter | Value | |-----------|-------| | Hidden Channels | 192 | | Filter Channels | 768 | | Attention Heads | 2 | | Encoder Layers | 6 | | Mel Channels | 80 | | FFT Size | 1024 | | Hop Length | 256 | ## Requirements - Python 3.8+ - PyTorch - scipy - numpy - Cython (for monotonic_align) ## Citation If you use this model, please cite: ```bibtex @misc{zls2025vits2claude, title={VITS2 Claude - Luxembourgish Gender-Neutral Voice}, author={Zenter fir d'Lëtzebuerger Sprooch}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/ZLSCompLing/VITS2-Claude} } ``` ## Acknowledgments Developed by [Zenter fir d'Lëtzebuerger Sprooch](https://zls.lu). Voice data sourced from the [Lëtzebuerger Online Dictionnaire (LOD)](https://lod.lu). The original audio files are available via the [LOD linguistic data on data.public.lu](https://data.public.lu/en/datasets/letzebuerger-online-dictionnaire-lod-linguistesch-daten/), which provides an XML file containing example sentence IDs. Audio files can be accessed at: ``` https://lod.lu/uploads/examples/AAC/{folder}/{id}.m4a ``` where `{folder}` is the first 2 characters of `{id}`. This model is used in [Sproochmaschinn](https://sproochmaschinn.lu), a Luxembourgish speech processing platform.