Yzamari
/

f5tts-hebrew-v2

speech-synthesis

Model card Files Files and versions

F5-TTS Hebrew v2

Hebrew text-to-speech model fine-tuned from F5-TTS on 158 hours of Hebrew speech.

Model Details

Base: F5-TTS v1 Base (non-autoregressive)
Steps: 320,000
Data: 68,569 segments (~158h) from SASPEECH Gold/Auto, FLEURS, Hebrew Speech Campus
All data re-vocalized with Phonikud G2P (stress marks included)
Hebrew-filtered: non-Hebrew segments removed
Output: 24kHz mono WAV
58 built-in voices with voice cloning support

Critical Usage Notes

Use model_state_dict, NOT ema_model_state_dict when loading from .pt checkpoints
Override text_num_embeds to match vocab_size (default 256 is wrong)
Use the included vocab.txt — char-to-index mapping must match exactly
Call model.sample() directly — F5TTS API/CLI are broken for fine-tuned models

Full Project

github.com/yzamari/hebAudio — complete Hebrew TTS system with Web UI, 58 voices, preprocessing pipeline, and training guide.

Downloads last month: 165

Safetensors

Model size

0.3B params

Tensor type

F32

·

Datasets used to train Yzamari/f5tts-hebrew-v2