|
|
--- |
|
|
license: cc-by-4.0 |
|
|
datasets: |
|
|
- sarulab-speech/mls_sidon |
|
|
- mythicinfinity/Libriheavy-HQ |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: audio-to-audio |
|
|
tags: |
|
|
- Audio |
|
|
- Codec |
|
|
- TTS |
|
|
--- |
|
|
# LayaCodec |
|
|
|
|
|
LayaCodec: Rapid, High-Fidelity Audio Compression: Reaching the Pareto Frontier in Neural Audio Codecs |
|
|
|
|
|
|
|
|
This is a neural audio codec/tokenizer that encodes 16khz at a rate from 12.5 t/s(0.16 kpbs) to 50 t/s(0.65 kpbs) using a single 8192 size codebook and decodes it into 44.1khz audio. |
|
|
This allows for much faster and scalable TTS models compared to othern modern codecs for several reasons. |
|
|
1. **Much** lower token rates than other single pass codecs such as Xcodec2(50 t/s), Snac(83 t/s), Dac(774 t/s), etc. |
|
|
2. **Much** smaller codebook size(8192) compared to Xcodec2(65536) for faster TTS model training speed. |
|
|
3. Over 40x faster then most diffusion based codecs allowing for **much** simpler and larger scale TTS models where codecs are not the bottleneck. |
|
|
4. Decodes audio into 44.1khz which is much higher quality then the common 24khz or 16khz sampling rate. |
|
|
|
|
|
Repo: https://github.com/ysharma3501/LayaCodec |
|
|
|
|
|
This is still W.I.P, it has only seen a few hundred hours of training data but surprisingly good quality. It will still need some more training. |
|
|
|
|
|
Released with a permissive CC-BY-4.0 license allowing for commercial or personal usage given a citation. |
|
|
Thanks very much to the authors of FocalCodec and Anime-XCodec2. |