You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

F5-TTS v1 - Nigerian Multilingual

A text-to-speech model for Nigerian languages based on F5-TTS, fine-tuned for natural speech synthesis in Igbo, Hausa, Yoruba, Nigerian Pidgin, and Nigerian English.

Supported Languages

Code Language Native Name
ig Igbo Asụsụ Igbo
ha Hausa Harshen Hausa
yo Yoruba Èdè Yorùbá
pcm Nigerian Pidgin Naija
en Nigerian English -

Quick Start

Installation

pip install f5-tts

Inference

from f5_tts.api import F5TTS

# Initialize with Nigerian model
tts = F5TTS(
    ckpt_file="path/to/model.pt",
    vocab_file="path/to/vocab.txt"
)

# Generate speech
audio = tts.infer(
    ref_audio="reference.wav",      # 5-15 second reference audio
    ref_text="Reference transcript",
    gen_text="Text to synthesize",
    target_sample_rate=24000
)

Using with Hugging Face

from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(repo_id="babs/f5tts-v1", filename="model.pt")
vocab_path = hf_hub_download(repo_id="babs/f5tts-v1", filename="vocab.txt")

# Use with F5-TTS
from f5_tts.api import F5TTS
tts = F5TTS(ckpt_file=model_path, vocab_file=vocab_path)

Model Details

Architecture

  • Base: F5-TTS DiT (Diffusion Transformer)
  • Vocoder: Vocos (24kHz)
  • Tokenizer: Character-level (2,624 tokens)

Training

Parameter Value
Base Model F5TTS_v1_Base (1.25M steps)
Fine-tuning Steps 151,540
Final Loss 0.731
Hardware 2x NVIDIA A100-SXM4-80GB
Precision bf16
Learning Rate 5e-5
Batch Size 51,200 frames/GPU
Epochs 20

Dataset

Trained on internal Nigerian language speech corpus covering:

  • Multiple speakers per language
  • Diverse recording conditions
  • Natural conversational and read speech

Usage Tips

  1. Reference Audio: Use 5-15 seconds of clear speech from your target speaker
  2. Reference Text: Provide accurate transcription of the reference audio
  3. Language Mixing: The model handles code-switching between supported languages
  4. Punctuation: Include punctuation for natural prosody

Limitations

  • Best results with reference audio similar to target speaker characteristics
  • May struggle with very long sentences (>50 words)
  • Tone marking for tonal languages (Igbo, Yoruba, Hausa) improves quality
  • Not designed for real-time streaming (use dedicated streaming models for <100ms latency)

Files

File Description Size
model.pt Model weights 5.1 GB
vocab.txt Character vocabulary 12 KB
config.json Model configuration 1 KB
samples/ Audio samples from training ~2 MB

Citation

If you use this model, please cite:

@misc{f5tts-nigerian-2026,
  title={F5-TTS Nigerian Multilingual},
  author={Spitch AI},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/babs/f5tts-v1}
}

License

MIT License - See the base F5-TTS repository for details.

Contact

For questions or issues, contact the Spitch AI team.

Downloads last month
57
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support