zhai-lw commited on
Commit
01f5d9b
·
verified ·
1 Parent(s): 5c45635

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -1,3 +1,62 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: audio-to-audio
4
+ ---
5
+ # SQCodec
6
+
7
+ This repository contains the implementation of SQCodec, a lightweight audio codec based on a single quantizer, introduced in the paper titled "One Quantizer is Enough: Toward a Lightweight Audio Codec".
8
+
9
+ [Paper](https://arxiv.org/abs/2504.04949)
10
+
11
+ [Code](https://github.com/zhai-lw/SQCodec)
12
+
13
+
14
+ ## install
15
+
16
+ ```
17
+ pip install sq_codec
18
+ ```
19
+
20
+ ### demo
21
+
22
+ Firstly, make sure you have installed the librosa package to load the example audio file. You can install it using pip:
23
+
24
+ ```
25
+ pip install librosa
26
+ ```
27
+
28
+ Then, you can use the following code to load a sample audio file, encode it using the SQCodec model, and decode it back
29
+ to audio. The code also calculates the mean squared error (MSE) between the original and generated audio.
30
+
31
+ ```python
32
+ import librosa
33
+ import torch
34
+ import sq_codec
35
+
36
+ all_models = sq_codec.list_models()
37
+ print(f"Available models: {all_models}")
38
+
39
+ MODEL_USED = '6kbps'
40
+ codec = sq_codec.get_model(MODEL_USED)
41
+ print(f"loaded codec({MODEL_USED}) and codec sample rate: {codec.config.sample_rate}")
42
+
43
+ sample_audio, sample_rate = librosa.load(librosa.example("libri1"))
44
+ sample_audio = sample_audio[None, :]
45
+ print(f"loaded sample audio and audio sample_rate :{sample_rate}")
46
+
47
+ sample_audio = librosa.resample(sample_audio, orig_sr=sample_rate, target_sr=codec.config.sample_rate)
48
+
49
+ codec.network.cuda()
50
+ codec.network.eval()
51
+ with torch.inference_mode():
52
+ audio_in = torch.tensor(sample_audio, dtype=torch.float32, device='cuda')
53
+ _, audio_length = audio_in.shape
54
+ print(f"{audio_in.shape=}")
55
+ q_feature, indices = codec.encode_audio(audio_in)
56
+ audio_out = codec.decode_audio(q_feature) # or
57
+ # audio_out = codec.decode_audio(indices=indices)
58
+ generated_audio = audio_out[:, :audio_length].detach().cpu().numpy()
59
+
60
+ mse = ((sample_audio - generated_audio) ** 2).mean().item()
61
+ print(f"codec({MODEL_USED}) mse: {mse}")
62
+ ```