You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Gaia Suite: Llama-OuteTTS-1.0-1B - Ewe (ee)

The model is a gaia suite on local languages and this model is adapted to Ewe (ʋegbe).

This is a fine-tuned version of OuteAI/Llama-OuteTTS-1.0-1B specifically trained to synthesize speech in the Ewe language. The model was fine-tuned using the Unsloth library with 16-bit LoRA adapters (Rank 64) for memory-efficient and fast training.

Model Details

Model Type: Text-to-Speech (TTS) Auto-Regressive Language Model
Language(s): Ewe (ee)
Base Model: OuteAI/Llama-OuteTTS-1.0-1B
Training Dataset: google/WaxalNLP (Ewe TTS subset)
Fine-Tuning Method: LoRA
Framework: Hugging Face transformers, trl, unsloth
License: CC-BY-4.0 (Attribution required)

Intended Use

This model is intended for generating Ewe speech from text. It is suitable for:

Accessibility tools for Ewe speakers
Educational applications and language learning
Voice assistants and read-aloud features in Ewe

Citation & Attribution

If you use this model in your research, applications, or projects, you must cite and attribute Junior Adenyo.

Limitations & Preprocessing

Text Normalization: Like many TTS models, this model struggles with raw numbers, acronyms, and special symbols. It is highly recommended to spell out numbers and dates in Ewe (e.g., convert 240 to its Ewe word equivalent) before feeding the text to the model.
Ewe Orthography: Ensure the input text correctly uses Ewe specific characters (Ɖ, Ɛ, Ƒ, Ɣ, Ŋ, Ɔ, Ʋ, ɖ, ɛ, ƒ, ɣ, ŋ, ɔ, ʋ) as the tokenizer has been explicitly resized to support them.

Usage (with OuteTTS and Unsloth)

import torch
import re
from unsloth import FastModel

# Load the fine-tuned model
model, tokenizer = FastModel.from_pretrained(
    model_name="analist/oute_ewe_r64_16bit",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=False,
)
FastModel.for_inference(model)

# Prepare your Ewe text
input_text = "Ya ʋuduʋudu si ƒe kpekpeme anɔ abe agbadroƒe blaatɔ̄ le gaƒoƒo ɖeka me ene la, aƒo."
formated_text = "<|text_start|>" + input_text + "<|text_end|>"
prompt = "\n".join([
    "<|im_start|>",
    formated_text,
    "<|audio_start|><|global_features_start|>",
])

model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

# Generate audio tokens
with torch.inference_mode():
    with torch.amp.autocast('cuda', dtype=model.dtype):
        generated_ids = model.generate(
            **model_inputs,
            temperature=0.1, 
            top_k=40,
            top_p=0.9,
            repetition_penalty=1.0, 
            min_p=0.05,
            max_new_tokens=4096,
        )

# Decode audio tokens to audio codes
decoded_output = tokenizer.batch_decode(generated_ids, skip_special_tokens=False)[0]
c1 = list(map(int, re.findall(r"<\|c1_(\d+)\|>", decoded_output)))
c2 = list(map(int, re.findall(r"<\|c2_(\d+)\|>", decoded_output)))

t = min(len(c1), len(c2))
audio_tokens = [c1[:t], c2[:t]]

# Note: To decode the generated tokens into a waveform, 
# you will need the DAC (Descript Audio Codec) interface from the OuteTTS library.
# from outetts.dac.interface import DacInterface
# dac = DacInterface()
# audio_waveform = dac.decode(torch.tensor([audio_tokens], dtype=torch.int64).to(dac.device))

Training Procedure

Batch Size: 2 (with Gradient Accumulation steps = 8)
Learning Rate: 5e-5
Epochs: 6
Optimizer: adamw_8bit
Hardware: Trained on a single NVIDIA RTX PRO 6000 Blackwell Edition.

Acknowledgements

Model architecture by OuteAI.
Dataset provided by Google's WaxalNLP project.
Fine-tuning powered by Unsloth.

Downloads last month: -

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for analist/oute_ewe_r64_16bit

Base model

OuteAI/Llama-OuteTTS-1.0-1B

Finetuned

(6)

this model

Dataset used to train analist/oute_ewe_r64_16bit

Collection including analist/oute_ewe_r64_16bit

GAIA

Collection

All tools needed to make African and low resource languages speak up ! • 9 items • Updated Apr 12