Oolel-Voice is a high-quality text-to-speech model purpose-built for Wolof. It supports voice cloning with expressive, modular control over tone, pace, and emotion — making it far more than a standard speech synthesizer. This fine-grained expressivity makes Oolel-Voice suitable for a broad range of use cases: from conversational AI and virtual assistants to professional content creation such as dubbing, audiobook narration, podcast production, and e-learning.

Usage

Install

pip install librosa>=0.10.2 transformers==4.46.3 diffusers==0.29.0 conformer==0.3.2 numpy>=2.0.0 torchcodec s3tokenizer num2words

Load the model

from huggingface_hub import snapshot_download
from transformers import AutoModel
path = snapshot_download(repo_id="soynade-research/Oolel-Voices")
model = AutoModel.from_pretrained(path, trust_remote_code=True)

Generate speech

import torchaudio
text = """Su ko nit ñi laajee: Ñan ngay defal say filme? day tontu naan: Europe mooy sama marché, waaye ñimay seetaan, Afrique lañu nekk. Fii ci réewum Europe laa wara ñëw ngir wut xaalis bi may defare samay filme, parce que fi la dooley koom-koom bi nekk. Waaye ñi ma jublu, maanaam ñi ma bëgg jàppale ci seen yewwute jaarale ko ci samay filme, Afrique lañu nekk. Du Sénégal kese, waaye ci Afrique yépp."""

wav = model.generate(text,
                     audio_prompt_path="https://huggingface.co/spaces/soynade-research/Oolel-Voices-Demo/resolve/main/8_1_c.wav",
                     cfg_weight=0.5,
                     exaggeration=0.2,
                     temperature=0.3
                     )

torchaudio.save("oolel_voices_speech.wav", wav, model.sr)
Downloads last month
92
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using soynade-research/Oolel-Voices 1