OronTTS — F5-TTS for Mongolian & Kazakh

Non-autoregressive text-to-speech model based on F5-TTS (Flow Matching + Diffusion Transformer) for Mongolian (Khalkha Cyrillic) and Kazakh (Cyrillic).

Model Details

Parameter Value
Architecture F5-TTS (OT-CFM + DiT + Vocos)
dim 1024
depth 22
heads 16
vocab_size 65
sample_rate 24000 Hz
mel_bins 100

Usage

from src.models.f5tts import F5TTS
from src.utils.checkpoint import CheckpointManager

model = F5TTS.from_config(config)
cm = CheckpointManager("checkpoints")
cm.load(model, path="f5tts_best.pt", device="cuda")

wav = model.synthesize(
    text="Сайн байна уу",
    lang="mn",
    ref_audio_path="ref.wav",
)

Training

Trained on btsee/mbspeech_mn (3,846 Mongolian speech samples).

License

MIT

Downloads last month
926
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support