f5-tts-dhivehi / README.md

Serialtechlab

F5-TTS Dhivehi fine-tuned model

64209f9 verified 4 days ago

preview code

raw

history blame contribute delete

1.64 kB

metadata

language:
  - dv
license: cc-by-nc-4.0
tags:
  - tts
  - text-to-speech
  - f5-tts
  - flow-matching
  - dhivehi
  - maldivian
  - thaana
  - voice-cloning
  - zero-shot-tts
datasets:
  - Serialtechlab/dhivehi-mms-v5-combined
  - Serialtechlab/dv-presidential-speech
  - alakxender/dv-audio-syn-lg
base_model: SWivid/F5-TTS
pipeline_tag: text-to-speech

F5-TTS Fine-tuned for Dhivehi (ދިވެހި)

Fine-tuned F5-TTS model for Dhivehi (Maldivian) text-to-speech with zero-shot voice cloning.

Model Details

Architecture: DiT (dim=1024, depth=22, heads=16)
Base Model: F5-TTS v1 Base
Vocoder: Vocos (24kHz)
Tokenizer: Custom character-level (Thaana + Latin + punctuation)
Vocab size: 2604 characters (59 Thaana chars added to base vocab)

Usage

from f5_tts.api import F5TTS

tts = F5TTS(
    model="F5TTS_v1_Base",
    ckpt_file="model.pt",
    vocab_file="vocab.txt",
)

wav, sr, _ = tts.infer(
    ref_file="reference.wav",
    ref_text="reference text in Dhivehi",
    gen_text="ދިވެހިރާއްޖެއަކީ ވަރަކް ރީތި ޔައުމެކެވެ",
)

Training Data

Dataset	Samples
Serialtechlab/dhivehi-mms-v5-combined	~9,660
Serialtechlab/dv-presidential-speech	~1,660
alakxender/dv-audio-syn-lg	~50,000 (synthetic)

Training Config

Learning rate: 1e-05
Batch size: 19200 frames
Epochs: 100
Mixed precision: bf16
GPU: NVIDIA A100 40GB

Files

model.pt - Fine-tuned F5-TTS weights
vocab.txt - Extended character vocabulary (Thaana + base)
config.json - Training configuration