Text-to-Speech
F5-TTS
Divehi
tts
flow-matching
dhivehi
maldivian
thaana
voice-cloning
zero-shot-tts

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

F5-TTS Fine-tuned for Dhivehi (ދިވެހި)

Fine-tuned F5-TTS model for Dhivehi (Maldivian) text-to-speech with zero-shot voice cloning.

Model Details

  • Architecture: DiT (dim=1024, depth=22, heads=16)
  • Base Model: F5-TTS v1 Base
  • Vocoder: Vocos (24kHz)
  • Tokenizer: Custom character-level (Thaana + Latin + punctuation)
  • Vocab size: 2604 characters (59 Thaana chars added to base vocab)

Usage

from f5_tts.api import F5TTS

tts = F5TTS(
    model="F5TTS_v1_Base",
    ckpt_file="model.pt",
    vocab_file="vocab.txt",
)

wav, sr, _ = tts.infer(
    ref_file="reference.wav",
    ref_text="reference text in Dhivehi",
    gen_text="ދިވެހިރާއްޖެއަކީ ވަރަކް ރީތި ޔައުމެކެވެ",
)

Training Data

Dataset Samples
Serialtechlab/dhivehi-mms-v5-combined ~9,660
Serialtechlab/dv-presidential-speech ~1,660
alakxender/dv-audio-syn-lg ~50,000 (synthetic)

Training Config

  • Learning rate: 1e-05
  • Batch size: 19200 frames
  • Epochs: 100
  • Mixed precision: bf16
  • GPU: NVIDIA A100 40GB

Files

  • model.pt - Fine-tuned F5-TTS weights
  • vocab.txt - Extended character vocabulary (Thaana + base)
  • config.json - Training configuration
Downloads last month
115
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Serialtechlab/f5-tts-dhivehi

Base model

SWivid/F5-TTS
Finetuned
(82)
this model

Datasets used to train Serialtechlab/f5-tts-dhivehi