File size: 1,739 Bytes
5ea4855
 
 
 
077775c
 
 
 
5ea4855
f29c718
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5ea4855
f29c718
 
 
 
 
 
 
 
 
 
 
 
 
f44ba97
f29c718
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: cc-by-4.0
language:
- en
pipeline_tag: audio-to-audio
tags:
- audio
- codec
---
## Linacodec: Highly compressive audio tokenizer for speech models.
<p align="center">
  <a href="https://huggingface.co/YatharthS/LinaCodec">
    <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-FFD21E" alt="Hugging Face Model">
  </a>
</p>

Linacodec is an audio tokenizer that compresses audio into just 12.5 tokens per second (171 bps) and decodes to 48khz audio!

### Key benefits
* Compression: 12.5 tokens/sec (60x more compressed than DAC).
* Audio Quality: 48khz output (much clearer then 16khz/24khz which is the standard).
* Encoder Speed: 200x realtime.
* Decoder Speed: 400x realtime(even faster with batching)
* Many Tasks: Indirectly even supports voice conversion, audio super-resolution, and audio denoising!

### Why is this even useful?
Audio tokenizers directly contribute to speed, quality, and capability of TTS/ASR models. LinaCodec massively improves upon previous codecs in these areas.
* Inference Speed: Enables TTS models to run 800x realtime, 8x faster than [MiraTTS](https://github.com/ysharma3501)!
* Fast training: High-quality TTS models can be trained in less then 1 day.
* Versatile: Works for both Text-to-Speech and Speech-to-Text unlike most other codecs.

### Comparisons
| Model | Total Tokens/Sec | Sample Rate |
| :--- | :--- | :--- |
| Linacodec | 12.5 | 48khz |
| DAC | 774 | 44.1khz |
| EnCodec | 300 | 24khz |
| Xcodec2 | 50 | 16khz |
| Mimi | 200 | 24khz |

Please check the repo for usage: https://github.com/ysharma3501/LinaCodec

Licence is CC-BY-4.0 meaning you can use it for any usecase(commercially/non-commercially) given you credit the original creator. Thank you.