|
|
--- |
|
|
license: cc-by-4.0 |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: audio-to-audio |
|
|
tags: |
|
|
- audio |
|
|
- codec |
|
|
--- |
|
|
## Linacodec: Highly compressive audio tokenizer for speech models. |
|
|
<p align="center"> |
|
|
<a href="https://huggingface.co/YatharthS/LinaCodec"> |
|
|
<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-FFD21E" alt="Hugging Face Model"> |
|
|
</a> |
|
|
</p> |
|
|
|
|
|
Linacodec is an audio tokenizer that compresses audio into just 12.5 tokens per second (171 bps) and decodes to 48khz audio! |
|
|
|
|
|
### Key benefits |
|
|
* Compression: 12.5 tokens/sec (60x more compressed than DAC). |
|
|
* Audio Quality: 48khz output (much clearer then 16khz/24khz which is the standard). |
|
|
* Encoder Speed: 200x realtime. |
|
|
* Decoder Speed: 400x realtime(even faster with batching) |
|
|
* Many Tasks: Indirectly even supports voice conversion, audio super-resolution, and audio denoising! |
|
|
|
|
|
### Why is this even useful? |
|
|
Audio tokenizers directly contribute to speed, quality, and capability of TTS/ASR models. LinaCodec massively improves upon previous codecs in these areas. |
|
|
* Inference Speed: Enables TTS models to run 800x realtime, 8x faster than [MiraTTS](https://github.com/ysharma3501)! |
|
|
* Fast training: High-quality TTS models can be trained in less then 1 day. |
|
|
* Versatile: Works for both Text-to-Speech and Speech-to-Text unlike most other codecs. |
|
|
|
|
|
### Comparisons |
|
|
| Model | Total Tokens/Sec | Sample Rate | |
|
|
| :--- | :--- | :--- | |
|
|
| Linacodec | 12.5 | 48khz | |
|
|
| DAC | 774 | 44.1khz | |
|
|
| EnCodec | 300 | 24khz | |
|
|
| Xcodec2 | 50 | 16khz | |
|
|
| Mimi | 200 | 24khz | |
|
|
|
|
|
Please check the repo for usage: https://github.com/ysharma3501/LinaCodec |
|
|
|
|
|
Licence is CC-BY-4.0 meaning you can use it for any usecase(commercially/non-commercially) given you credit the original creator. Thank you. |