WAXAL MMS-TTS - Oromo (orm)

Fine-tuning-ready checkpoint for Oromo (orm), built from facebook/mms-tts-orm for use with the WAXAL TTS dataset and the finetune-hf-vits pipeline.

WAXAL config google/WaxalNLP - orm_tts
Data provider Digital Umuganda
Base model facebook/mms-tts-orm
License CC-BY-NC 4.0 (inherited from base model)

What this repository adds

facebook/mms-tts-{iso} checkpoints are inference-only releases that crash run_vits_finetuning.py. This repository applies three patches:

File Change
config.json pad_token_id set to 0 (was null)
tokenizer_config.json pad_token entry added
preprocessor_config.json Added - VitsFeatureExtractor config from ylacombe/mms-tts-eng-train

Model weights are not stored here. _name_or_path in config.json points to facebook/mms-tts-orm, so run_vits_finetuning.py loads weights from the original Facebook checkpoint at training time.

preprocessor_config.json

Downloaded verbatim from ylacombe/mms-tts-eng-train. Values are VITS architecture constants shared by all MMS-TTS languages.

Field Value
feature_extractor_type VitsFeatureExtractor
feature_size 80
hop_length 256
max_wav_value 32768.0
n_fft 1024
padding_side right
padding_value 0.0
return_attention_mask False
sampling_rate 16000
spec_gain 1

Usage in finetune-hf-vits

{
  "model_name_or_path":     "rnjema-unima/mms-tts-orm-train",
  "feature_extractor_name": "rnjema-unima/mms-tts-orm-train",
  "dataset_name":           "google/WaxalNLP",
  "dataset_config_name":    "orm_tts",
  "audio_column_name":      "audio",
  "text_column_name":       "text",
  "train_split_name":       "train",
  "eval_split_name":        "validation"
}

Inference (after fine-tuning)

from transformers import VitsModel, VitsTokenizer
import torch, scipy

model     = VitsModel.from_pretrained("your-org/your-finetuned-model")
tokenizer = VitsTokenizer.from_pretrained("your-org/your-finetuned-model")

inputs = tokenizer("Your text in Oromo.", return_tensors="pt")
with torch.no_grad():
    out = model(**inputs)

scipy.io.wavfile.write("output.wav", model.config.sampling_rate,
    out.waveform.squeeze().numpy())

Technical details

Architecture VITS (end-to-end, no separate vocoder)
pad_token_id 0
vocab_size 29
is_uroman false
sampling_rate 16000 Hz

References

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rnjema-unima/mms-tts-orm-baseline

Finetuned
(1)
this model

Dataset used to train rnjema-unima/mms-tts-orm-baseline

Paper for rnjema-unima/mms-tts-orm-baseline