F5-TTS Fine-tuned for Dhivehi (ދިވެހި)
Fine-tuned F5-TTS model for Dhivehi (Maldivian) text-to-speech with zero-shot voice cloning.
Model Details
- Architecture: DiT (dim=1024, depth=22, heads=16)
- Base Model: F5-TTS v1 Base
- Vocoder: Vocos (24kHz)
- Tokenizer: Custom character-level (Thaana + Latin + punctuation)
- Vocab size: 2604 characters (59 Thaana chars added to base vocab)
Usage
from f5_tts.api import F5TTS
tts = F5TTS(
model="F5TTS_v1_Base",
ckpt_file="model.pt",
vocab_file="vocab.txt",
)
wav, sr, _ = tts.infer(
ref_file="reference.wav",
ref_text="reference text in Dhivehi",
gen_text="ދިވެހިރާއްޖެއަކީ ވަރަކް ރީތި ޔައުމެކެވެ",
)
Training Data
| Dataset | Samples |
|---|---|
| Serialtechlab/dhivehi-mms-v5-combined | ~9,660 |
| Serialtechlab/dv-presidential-speech | ~1,660 |
| alakxender/dv-audio-syn-lg | ~50,000 (synthetic) |
Training Config
- Learning rate: 1e-05
- Batch size: 19200 frames
- Epochs: 100
- Mixed precision: bf16
- GPU: NVIDIA A100 40GB
Files
model.pt- Fine-tuned F5-TTS weightsvocab.txt- Extended character vocabulary (Thaana + base)config.json- Training configuration
- Downloads last month
- 115
Model tree for Serialtechlab/f5-tts-dhivehi
Base model
SWivid/F5-TTS