F5-TTS Hebrew v2

Hebrew text-to-speech model fine-tuned from F5-TTS on 158 hours of Hebrew speech.

Model Details

  • Base: F5-TTS v1 Base (non-autoregressive)
  • Steps: 320,000
  • Data: 68,569 segments (~158h) from SASPEECH Gold/Auto, FLEURS, Hebrew Speech Campus
  • All data re-vocalized with Phonikud G2P (stress marks included)
  • Hebrew-filtered: non-Hebrew segments removed
  • Output: 24kHz mono WAV
  • 58 built-in voices with voice cloning support

Critical Usage Notes

  1. Use model_state_dict, NOT ema_model_state_dict when loading from .pt checkpoints
  2. Override text_num_embeds to match vocab_size (default 256 is wrong)
  3. Use the included vocab.txt — char-to-index mapping must match exactly
  4. Call model.sample() directly — F5TTS API/CLI are broken for fine-tuned models

Full Project

github.com/yzamari/hebAudio — complete Hebrew TTS system with Web UI, 58 voices, preprocessing pipeline, and training guide.

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train Yzamari/f5tts-hebrew-v2