1.18 GB
11 files
Updated 7 days ago
Name
Size
vocoder
.gitattributes1.52 kB
xet
README.md1.46 kB
xet
config.json697 Bytes
xet
fm_decoder.onnx478 MB
xet
fm_decoder_int8.onnx125 MB
xet
model.pt491 MB
xet
text_encoder.onnx17.6 MB
xet
text_encoder_int8.onnx5.57 MB
xet
tokens.txt2.57 kB
xet
README.md

LuxTTS

Hugging Face Model   Colab Notebook

This is the model for LuxTTS, a lightweight zipvoice based text-to-speech model designed for high quality voice cloning and realistic generation at speeds exceeding 150x realtime.

Main features

  • Voice cloning: SOTA voice cloning on par with models 10x larger.
  • Clarity: Clear 48khz speech generation unlike most TTS models which are limited to 24khz.
  • Speed: Reaches speeds of 150x realtime on a single GPU and faster then realtime on CPU's as well.
  • Efficiency: Fits within 1gb vram meaning it can fit in any local gpu.

Details

  • Based on ZipVoice, distilled to 4steps.
  • Uses 48khz vocoder instead of 24khz vocoder.
  • Implemented higher quality sampling technique then standard euler.

Usage

Please check out the repo for usage: https://github.com/ysharma3501/LuxTTS.git

License

Model and code is released under Apache-2.0 license.

If you find the model/code helpful, stars or likes would be appreciated. Thank you.

Total size
1.18 GB
Files
11
Last updated
Jun 19
Pre-warmed CDN
US EU US EU

Contributors