Text-to-Speech
Kinyarwanda
kinyarwanda
tts

Kinyarwanda Text-to-Speech (TTS) Model used with the agricultural chatbot

In Rwanda, many farmers struggle to access timely, personalized agricultural information. Traditional channels - like radio, TV, and online sources - offer limited reach and interactivity, while extension services and a national call center, staffed by only two agents for over two million farmers, face capacity constraints. To address these gaps, we developed a 24/7 AI-enabled Interactive Voice Response (IVR) tool. Accessible via a Kinyarwanda-speaking hotline, this tool provides advisory on topics such as pest and disease diagnosis and agro-climatic practices, as well as information on MINAGRI’s support programs for farmers, e.g. crop insurances. By utilizing AI and IVR technology, this project will make agricultural advisories more accessible, timely, and responsive to farmers’ needs. For more information, please reach out to C4IR.

Implemented by: C4IR Rwanda & KiNLP; Supported by GIZ; Financed by: BMZ.

Technical Documentation

The pre-trained model implements a multi-speaker TTS model with MB-iSTFT-VITS2 architecture. The implementation code for training and evaluating the model is available from DeepKIN-AgAI package. The model was trained using the agricultural TTS dataset available at: https://huggingface.co/datasets/C4IR-RW/kinya-ag-tts

Python code for running inference:


# PyTorch and TorchAudio (Version 2.7.1 recommended)
# Example install command:
#! pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
import torch
import torchaudio

# DeepKIN imports
# See: https://github.com/c4ir-rw/ac-ai-models/tree/main/DeepKIN-AgAI
from deepkin.data.kinya_norm import text_to_sequence
from deepkin.models.flex_tts import FlexKinyaTTS
from deepkin.modules.tts_commons import intersperse

# Define inference device
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')

# Load TTS model (HF: C4IR-RW/kinya-flex-tts)
kinya_tts = FlexKinyaTTS.from_pretrained(device, '/path/to/kinya_flex_tts_base_trained.pt')
kinya_tts.eval()

text = "Ikiremwamuntu cyose kivukana umudendezo kandi kingana mu cyubahiro n'uburenganzira. Gifite ubushobozi bwo gutekereza n'umutimanama kandi kigomba gukorera bagenzi bacyo mu mwuka wa kivandimwe."

# Normalize and tokenize input text
text_id_sequence = intersperse(text_to_sequence(text, norm=True), 0)

# Select voice (speaker id).
# Available speaker ids:
# 0 - Female 1
# 1 - Female 2
# 2 - Male
speaker_id = 0

# Run inference: Generate audio samples, 24KHz sampling rate by default
audio_data = kinya_tts(text_id_sequence, 0)

# Save audio file (24KHz sampling rate by default)
sampling_rate = 24000
torchaudio.save("/path/to/example_output.wav", audio_data, sampling_rate)

License

This model is licensed under the Creative Commons Attribution 4.0 International License (CC-BY 4.0).

Attribution: Please attribute this work to C4IR Rwanda and KiNLP.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train C4IR-RW/kinya-flex-tts