cxlrd
/

luganda-tts-v3

+---
+language: lg
+license: apache-2.0
+tags: [text-to-speech, tts, nemo, luganda, african-languages]
+datasets: [Sunbird/salt]
+---
+# Luganda TTS v3
+Text-to-Speech system for Luganda using NVIDIA NeMo.
+## Models
+| Model | Description | Size |
+|-------|-------------|------|
+| `luganda_fastpitch.nemo` | FastPitch spectrogram generator | 187 MB |
+| `luganda_hifigan.nemo` | HiFi-GAN neural vocoder | 339 MB |
+## Training
+- **Dataset**: Sunbird/salt (~2,380 samples, 2.69 hours)
+- **FastPitch**: 20,000 steps
+- **HiFi-GAN**: 20,000 steps
+- **Sample Rate**: 22,050 Hz
+## Usage
+```python
+from nemo.collections.tts.models import FastPitchModel, HifiGanModel
+fastpitch = FastPitchModel.restore_from("luganda_fastpitch.nemo")
+hifigan = HifiGanModel.restore_from("luganda_hifigan.nemo")
+text = "Oli otya?"
+spec = fastpitch.generate_spectrogram(tokens=fastpitch.parse(text))
+audio = hifigan.convert_spectrogram_to_audio(spec=spec)
+```