YatharthS
/

LinaCodec

Model card Files Files and versions

LinaCodec / README.md

YatharthS's picture

Update README.md

077775c verified 9 days ago

|

history blame contribute delete

1.74 kB

	---
	license: cc-by-4.0
	language:
	- en
	pipeline_tag: audio-to-audio
	tags:
	- audio
	- codec
	---
	## Linacodec: Highly compressive audio tokenizer for speech models.
	<p align="center">
	<a href="https://huggingface.co/YatharthS/LinaCodec">
	<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-FFD21E" alt="Hugging Face Model">
	</a>
	</p>

	Linacodec is an audio tokenizer that compresses audio into just 12.5 tokens per second (171 bps) and decodes to 48khz audio!

	### Key benefits
	* Compression: 12.5 tokens/sec (60x more compressed than DAC).
	* Audio Quality: 48khz output (much clearer then 16khz/24khz which is the standard).
	* Encoder Speed: 200x realtime.
	* Decoder Speed: 400x realtime(even faster with batching)
	* Many Tasks: Indirectly even supports voice conversion, audio super-resolution, and audio denoising!

	### Why is this even useful?
	Audio tokenizers directly contribute to speed, quality, and capability of TTS/ASR models. LinaCodec massively improves upon previous codecs in these areas.
	* Inference Speed: Enables TTS models to run 800x realtime, 8x faster than [MiraTTS](https://github.com/ysharma3501)!
	* Fast training: High-quality TTS models can be trained in less then 1 day.
	* Versatile: Works for both Text-to-Speech and Speech-to-Text unlike most other codecs.

	### Comparisons
	\| Model \| Total Tokens/Sec \| Sample Rate \|
	\| :--- \| :--- \| :--- \|
	\| Linacodec \| 12.5 \| 48khz \|
	\| DAC \| 774 \| 44.1khz \|
	\| EnCodec \| 300 \| 24khz \|
	\| Xcodec2 \| 50 \| 16khz \|
	\| Mimi \| 200 \| 24khz \|

	Please check the repo for usage: https://github.com/ysharma3501/LinaCodec

	Licence is CC-BY-4.0 meaning you can use it for any usecase(commercially/non-commercially) given you credit the original creator. Thank you.