Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,62 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
pipeline_tag: audio-to-audio
|
| 4 |
+
---
|
| 5 |
+
# SQCodec
|
| 6 |
+
|
| 7 |
+
This repository contains the implementation of SQCodec, a lightweight audio codec based on a single quantizer, introduced in the paper titled "One Quantizer is Enough: Toward a Lightweight Audio Codec".
|
| 8 |
+
|
| 9 |
+
[Paper](https://arxiv.org/abs/2504.04949)
|
| 10 |
+
|
| 11 |
+
[Code](https://github.com/zhai-lw/SQCodec)
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
## install
|
| 15 |
+
|
| 16 |
+
```
|
| 17 |
+
pip install sq_codec
|
| 18 |
+
```
|
| 19 |
+
|
| 20 |
+
### demo
|
| 21 |
+
|
| 22 |
+
Firstly, make sure you have installed the librosa package to load the example audio file. You can install it using pip:
|
| 23 |
+
|
| 24 |
+
```
|
| 25 |
+
pip install librosa
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
Then, you can use the following code to load a sample audio file, encode it using the SQCodec model, and decode it back
|
| 29 |
+
to audio. The code also calculates the mean squared error (MSE) between the original and generated audio.
|
| 30 |
+
|
| 31 |
+
```python
|
| 32 |
+
import librosa
|
| 33 |
+
import torch
|
| 34 |
+
import sq_codec
|
| 35 |
+
|
| 36 |
+
all_models = sq_codec.list_models()
|
| 37 |
+
print(f"Available models: {all_models}")
|
| 38 |
+
|
| 39 |
+
MODEL_USED = '6kbps'
|
| 40 |
+
codec = sq_codec.get_model(MODEL_USED)
|
| 41 |
+
print(f"loaded codec({MODEL_USED}) and codec sample rate: {codec.config.sample_rate}")
|
| 42 |
+
|
| 43 |
+
sample_audio, sample_rate = librosa.load(librosa.example("libri1"))
|
| 44 |
+
sample_audio = sample_audio[None, :]
|
| 45 |
+
print(f"loaded sample audio and audio sample_rate :{sample_rate}")
|
| 46 |
+
|
| 47 |
+
sample_audio = librosa.resample(sample_audio, orig_sr=sample_rate, target_sr=codec.config.sample_rate)
|
| 48 |
+
|
| 49 |
+
codec.network.cuda()
|
| 50 |
+
codec.network.eval()
|
| 51 |
+
with torch.inference_mode():
|
| 52 |
+
audio_in = torch.tensor(sample_audio, dtype=torch.float32, device='cuda')
|
| 53 |
+
_, audio_length = audio_in.shape
|
| 54 |
+
print(f"{audio_in.shape=}")
|
| 55 |
+
q_feature, indices = codec.encode_audio(audio_in)
|
| 56 |
+
audio_out = codec.decode_audio(q_feature) # or
|
| 57 |
+
# audio_out = codec.decode_audio(indices=indices)
|
| 58 |
+
generated_audio = audio_out[:, :audio_length].detach().cpu().numpy()
|
| 59 |
+
|
| 60 |
+
mse = ((sample_audio - generated_audio) ** 2).mean().item()
|
| 61 |
+
print(f"codec({MODEL_USED}) mse: {mse}")
|
| 62 |
+
```
|