SQCodec / README.md

Add library name to metadata and available models table (#1)

e80e263 verified 10 months ago

2.64 kB

	---
	license: mit
	pipeline_tag: audio-to-audio
	library_name: sq_codec
	---

	# SQCodec

	This repository contains the implementation of SQCodec, a lightweight audio codec based on a single quantizer, introduced in the paper titled "One Quantizer is Enough: Toward a Lightweight Audio Codec".

	[Paper](https://arxiv.org/abs/2504.04949)

	[Code](https://github.com/zhai-lw/SQCodec)

	## install

	```
	pip install sq_codec
	```

	### demo

	Firstly, make sure you have installed the librosa package to load the example audio file. You can install it using pip:

	```
	pip install librosa
	```

	Then, you can use the following code to load a sample audio file, encode it using the SQCodec model, and decode it back
	to audio. The code also calculates the mean squared error (MSE) between the original and generated audio.

	```python
	import librosa
	import torch
	import sq_codec

	all_models = sq_codec.list_models()
	print(f"Available models: {all_models}")

	MODEL_USED = '6kbps'
	codec = sq_codec.get_model(MODEL_USED)
	print(f"loaded codec({MODEL_USED}) and codec sample rate: {codec.config.sample_rate}")

	sample_audio, sample_rate = librosa.load(librosa.example("libri1"))
	sample_audio = sample_audio[None, :]
	print(f"loaded sample audio and audio sample_rate :{sample_rate}")

	sample_audio = librosa.resample(sample_audio, orig_sr=sample_rate, target_sr=codec.config.sample_rate)

	codec.network.cuda()
	codec.network.eval()
	with torch.inference_mode():
	audio_in = torch.tensor(sample_audio, dtype=torch.float32, device='cuda')
	_, audio_length = audio_in.shape
	print(f"{audio_in.shape=}")
	q_feature, indices = codec.encode_audio(audio_in)
	audio_out = codec.decode_audio(q_feature) # or
	# audio_out = codec.decode_audio(indices=indices)
	generated_audio = audio_out[:, :audio_length].detach().cpu().numpy()

	mse = ((sample_audio - generated_audio) ** 2).mean().item()
	print(f"codec({MODEL_USED}) mse: {mse}")
	```

	### available models

	\| config_name \| Sample rate(Hz) \| tokens/s \| Codebook size \| Bitrate(bps) \|
	\|--------------\|-----------------\|----------\|---------------\|--------------\|
	\| 0k75bps \| 16,000 \| 44.44 \| 117,649 \| 748.6 \|
	\| 1k5bps \| 16,000 \| 88.89 \| 117,649 \| 1497.3 \|
	\| 3kbps \| 16,000 \| 177.78 \| 117,649 \| 2994.5 \|
	\| 6kbps \| 16,000 \| 355.56 \| 117,649 \| 5989.0 \|
	\| 12kbps \| 16,000 \| 666.67 \| 250,047 \| 11954.6 \|
	\| 12kbps_24khz \| 24,000 \| 666.67 \| 250,047 \| 11954.6 \|
	\| 24kbps_24khz \| 24,000 \| 1333.33 \| 250,047 \| 23909.1 \|