Mixer-TTS Hebrew/English IPA Fine-Tune

This repository contains a Hebrew/English mixed fine-tune of Mixer-TTS, a non-autoregressive compact text-to-speech model.

The model was fine-tuned from the PyTorch implementation at nipponjo/mixer-tts-pytorch on a mixed Hebrew and English IPA-phoneme dataset of about 24 hours of speech.

Files

  • synthetic_ft_80_best_vocos_int8.onnx - best validation checkpoint exported as a single embedded-Vocos ONNX model with dynamic int8 quantization.
  • synthetic_ft_80_latest_vocos_fp32.onnx - latest checkpoint exported as a single fp32 ONNX model with embedded Vocos vocoder. Outputs waveform audio directly.
  • synthetic_ft_80_latest_hifigan_fp32.onnx - latest checkpoint exported as a single fp32 ONNX model with embedded official LJ HiFi-GAN V1 vocoder. Outputs waveform audio directly.
  • best.pth - PyTorch training checkpoint for the best validation loss checkpoint.

Inference and Training Code

See the fork and ONNX wrapper here:

Basic embedded-ONNX usage:

from mixer_tts_onnx import MixerTTS

tts = MixerTTS("synthetic_ft_80_latest_vocos_fp32.onnx")
tts.create(
    "sˈimu lˈev nosʔˈim jekaʁˈim.",
    is_phonemes=True,
    output_path="sample.wav",
)

The current ONNX wrapper assumes IPA phonemes when is_phonemes=True. For plain English text it can phonemize with eSpeak; Hebrew should be passed as IPA phonemes.

Training Notes

  • Dataset: mixed Hebrew and English speech, IPA phoneme labels.
  • Acoustic dimension: 80 mel bins.
  • Sample rate: 22,050 Hz.
  • Checkpoint selection: best.pth by validation loss; latest files are exported from the latest training last.pth at upload time.
  • ONNX vocoder options: embedded Vocos or embedded HiFi-GAN.

Citation

Mixer-TTS paper:

@article{Tatanov2021MixerTTSNF,
  title={Mixer-TTS: Non-Autoregressive, Fast and Compact Text-to-Speech Model Conditioned on Language Model Embeddings},
  author={Oktai Tatanov and Stanislav Beliaev and Boris Ginsburg},
  journal={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2021},
  pages={7482-7486},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for thewh1teagle/mixer-tts