YatharthS
/

LayaCodec

Model card Files Files and versions

LayaCodec / README.md

YaTharThShaRma999's picture

YaTharThShaRma999

Update README.md

1eb31e8 verified 2 months ago

|

1.41 kB

	---
	license: cc-by-4.0
	datasets:
	- sarulab-speech/mls_sidon
	- mythicinfinity/Libriheavy-HQ
	language:
	- en
	pipeline_tag: audio-to-audio
	tags:
	- Audio
	- Codec
	- TTS
	---
	# LayaCodec

	LayaCodec: Rapid, High-Fidelity Audio Compression: Reaching the Pareto Frontier in Neural Audio Codecs


	This is a neural audio codec/tokenizer that encodes 16khz at a rate from 12.5 t/s(0.16 kpbs) to 50 t/s(0.65 kpbs) using a single 8192 size codebook and decodes it into 44.1khz audio.
	This allows for much faster and scalable TTS models compared to othern modern codecs for several reasons.
	1. Much lower token rates than other single pass codecs such as Xcodec2(50 t/s), Snac(83 t/s), Dac(774 t/s), etc.
	2. Much smaller codebook size(8192) compared to Xcodec2(65536) for faster TTS model training speed.
	3. Over 40x faster then most diffusion based codecs allowing for much simpler and larger scale TTS models where codecs are not the bottleneck.
	4. Decodes audio into 44.1khz which is much higher quality then the common 24khz or 16khz sampling rate.

	Repo: https://github.com/ysharma3501/LayaCodec

	This is still W.I.P, it has only seen a few hundred hours of training data but surprisingly good quality. It will still need some more training.

	Released with a permissive CC-BY-4.0 license allowing for commercial or personal usage given a citation.
	Thanks very much to the authors of FocalCodec and Anime-XCodec2.