3arab-tts
Collection
3 items โข Updated
VAE version of the Descript Audio Codec, which has a continuous latent space. Descript Audio Codec (DAC) is a high fidelity general neural audio codec, introduced in the paper titled High-Fidelity Audio Compression with Improved RVQGAN. Most code is adopted from the open-source repo DAC
According to the Semantic-VAE paper, this semantic distillation approach improves the training efficiency and performance of downstream TTS models. Furthermore, by reducing the latent dimension to 32, this new variant enables even lighter and faster training for these downstream tasks without sacrificing much audio quality.
Thanks to
facebook/dacvae-watermarked
Aratako/Semantic-DACVAE-Japanese-32dim