Kikuyu TTS (Fine-tuned MMS-TTS)

A fine-tuned Text-to-Speech model for Kikuyu (Gĩkũyũ).

Model Details

Base Model facebook/mms-tts-kik
Architecture VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech)
Language Kikuyu (kik) — Bantu, tonal language
Parameters ~36M
Training Data 48,500 audio samples from ANV + WAXAL datasets
Training GAN-based adversarial fine-tuning on A100 80GB GPU
Developer C-elo Labs

Training Data

This model was fine-tuned on real human Kikuyu speech recordings from two sources:

Dataset Samples Description
ANV (African Next Voices) ~46,900 Scripted Kikuyu speech across 5 dialects
WAXAL (Google) ~1,600 Studio-quality TTS recordings

Usage

from transformers import VitsModel, AutoTokenizer
import torch
import scipy.io.wavfile as wavfile

model = VitsModel.from_pretrained("gateremark/kikuyu-tts-v1")
tokenizer = AutoTokenizer.from_pretrained("gateremark/kikuyu-tts-v1")

text = "ũhoro waku"
inputs = tokenizer(text=text, return_tensors="pt")

with torch.no_grad():
    output = model(**inputs)

waveform = output.waveform[0].numpy()
wavfile.write("output.wav", model.config.sampling_rate, waveform)

Training Details

  • Method: Fine-tuned using the ylacombe VITS training recipe with GAN-based adversarial loss
  • Epochs: ~70,000 steps
  • Optimizer: AdamW
  • Loss: Mel reconstruction + KL divergence + discriminator adversarial loss
  • Hardware: NVIDIA A100 80GB (Modal)

Limitations

  • Optimized for standard Kikuyu text; may struggle with code-switched (Kikuyu-English) input
  • Tonal accuracy depends on correct diacritical marks in input text (e.g., ũ vs u)
  • Single-speaker output

About C-elo Labs

C-elo Labs is an AI research organization pioneering high-fidelity language models for underserved African languages. We build text and voice AI systems for Kikuyu, Kamba, and other African languages.

Citation

If you use this model, please cite:

@misc{gatere2026kikuyutts,
  title={Fine-tuned Kikuyu TTS based on MMS-TTS},
  author={Mark Gatere},
  year={2026},
  publisher={C-elo Labs},
  url={https://huggingface.co/gateremark/kikuyu-tts-v1}
}
Downloads last month
6
Safetensors
Model size
36.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gateremark/kikuyu-tts-v1

Finetuned
(1)
this model

Datasets used to train gateremark/kikuyu-tts-v1