You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

F5-TTS v1 - Nigerian Multilingual

A text-to-speech model for Nigerian languages based on F5-TTS, fine-tuned for natural speech synthesis in Igbo, Hausa, Yoruba, Nigerian Pidgin, and Nigerian English.

Supported Languages

Code	Language	Native Name
`ig`	Igbo	Asụsụ Igbo
`ha`	Hausa	Harshen Hausa
`yo`	Yoruba	Èdè Yorùbá
`pcm`	Nigerian Pidgin	Naija
`en`	Nigerian English	-

Quick Start

Installation

pip install f5-tts

Inference

from f5_tts.api import F5TTS

# Initialize with Nigerian model
tts = F5TTS(
    ckpt_file="path/to/model.pt",
    vocab_file="path/to/vocab.txt"
)

# Generate speech
audio = tts.infer(
    ref_audio="reference.wav",      # 5-15 second reference audio
    ref_text="Reference transcript",
    gen_text="Text to synthesize",
    target_sample_rate=24000
)

Using with Hugging Face

from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(repo_id="babs/f5tts-v1", filename="model.pt")
vocab_path = hf_hub_download(repo_id="babs/f5tts-v1", filename="vocab.txt")

# Use with F5-TTS
from f5_tts.api import F5TTS
tts = F5TTS(ckpt_file=model_path, vocab_file=vocab_path)

Model Details

Architecture

Base: F5-TTS DiT (Diffusion Transformer)
Vocoder: Vocos (24kHz)
Tokenizer: Character-level (2,624 tokens)

Training

Parameter	Value
Base Model	F5TTS_v1_Base (1.25M steps)
Fine-tuning Steps	151,540
Final Loss	0.731
Hardware	2x NVIDIA A100-SXM4-80GB
Precision	bf16
Learning Rate	5e-5
Batch Size	51,200 frames/GPU
Epochs	20

Dataset

Trained on internal Nigerian language speech corpus covering:

Multiple speakers per language
Diverse recording conditions
Natural conversational and read speech

Usage Tips

Reference Audio: Use 5-15 seconds of clear speech from your target speaker
Reference Text: Provide accurate transcription of the reference audio
Language Mixing: The model handles code-switching between supported languages
Punctuation: Include punctuation for natural prosody

Limitations

Best results with reference audio similar to target speaker characteristics
May struggle with very long sentences (>50 words)
Tone marking for tonal languages (Igbo, Yoruba, Hausa) improves quality
Not designed for real-time streaming (use dedicated streaming models for <100ms latency)

Files

File	Description	Size
`model.pt`	Model weights	5.1 GB
`vocab.txt`	Character vocabulary	12 KB
`config.json`	Model configuration	1 KB
`samples/`	Audio samples from training	~2 MB

Citation

If you use this model, please cite:

@misc{f5tts-nigerian-2026,
  title={F5-TTS Nigerian Multilingual},
  author={Spitch AI},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/babs/f5tts-v1}
}

License

MIT License - See the base F5-TTS repository for details.

Contact

For questions or issues, contact the Spitch AI team.

Downloads last month: 57