--- license: cc-by-4.0 language: - en pipeline_tag: audio-to-audio tags: - audio - codec --- ## Linacodec: Highly compressive audio tokenizer for speech models.
Linacodec is an audio tokenizer that compresses audio into just 12.5 tokens per second (171 bps) and decodes to 48khz audio! ### Key benefits * Compression: 12.5 tokens/sec (60x more compressed than DAC). * Audio Quality: 48khz output (much clearer then 16khz/24khz which is the standard). * Encoder Speed: 200x realtime. * Decoder Speed: 400x realtime(even faster with batching) * Many Tasks: Indirectly even supports voice conversion, audio super-resolution, and audio denoising! ### Why is this even useful? Audio tokenizers directly contribute to speed, quality, and capability of TTS/ASR models. LinaCodec massively improves upon previous codecs in these areas. * Inference Speed: Enables TTS models to run 800x realtime, 8x faster than [MiraTTS](https://github.com/ysharma3501)! * Fast training: High-quality TTS models can be trained in less then 1 day. * Versatile: Works for both Text-to-Speech and Speech-to-Text unlike most other codecs. ### Comparisons | Model | Total Tokens/Sec | Sample Rate | | :--- | :--- | :--- | | Linacodec | 12.5 | 48khz | | DAC | 774 | 44.1khz | | EnCodec | 300 | 24khz | | Xcodec2 | 50 | 16khz | | Mimi | 200 | 24khz | Please check the repo for usage: https://github.com/ysharma3501/LinaCodec Licence is CC-BY-4.0 meaning you can use it for any usecase(commercially/non-commercially) given you credit the original creator. Thank you.