Add comprehensive model card for L3AC

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: audio-to-audio
3
+ library_name: l3ac
4
+ ---
5
+
6
+ # L3AC: Towards a Lightweight and Lossless Audio Codec
7
+
8
+ This repository contains the implementation of L3AC, a lightweight neural audio codec introduced in the paper titled "[L3AC: Towards a Lightweight and Lossless Audio Codec](https://huggingface.co/papers/2504.04949)".
9
+
10
+ Neural audio codecs have recently gained traction for their ability to compress high-fidelity audio and provide discrete tokens for generative modeling. However, leading approaches often rely on resource-intensive models and complex multi-quantizer architectures, limiting their practicality in real-world applications. In this work, we introduce L3AC, a lightweight neural audio codec that addresses these challenges by leveraging a single quantizer and a highly efficient architecture. To enhance reconstruction fidelity while minimizing model complexity, L3AC explores streamlined convolutional networks and local Transformer modules, alongside TConv--a novel structure designed to capture acoustic variations across multiple temporal scales. Despite its compact design, extensive experiments across diverse datasets demonstrate that L3AC matches or exceeds the reconstruction quality of leading codecs while reducing computational overhead by an order of magnitude. The single-quantizer design further enhances its adaptability for downstream tasks.
11
+
12
+ <figure class="image">
13
+ <img src="https://github.com/zhai-lw/L3AC/raw/main/bubble_chart.svg" alt="Comparison of various audio codec">
14
+ <figcaption>Comparison of various audio codec</figcaption>
15
+ </figure>
16
+
17
+ **Paper:** [L3AC: Towards a Lightweight and Lossless Audio Codec](https://huggingface.co/papers/2504.04949)
18
+ **Official GitHub Repository:** [https://github.com/zhai-lw/L3AC](https://github.com/zhai-lw/L3AC)
19
+
20
+ ## Installation
21
+
22
+ You can install the `l3ac` library using pip:
23
+
24
+ ```bash
25
+ pip install l3ac
26
+ ```
27
+
28
+ ### Demo
29
+
30
+ Firstly, make sure you have installed the `librosa` package to load the example audio file. You can install it using pip:
31
+
32
+ ```bash
33
+ pip install librosa
34
+ ```
35
+
36
+ Then, you can use the following code to load a sample audio file, encode it using the L3AC model, and decode it back to audio. The code also calculates the mean squared error (MSE) between the original and generated audio.
37
+
38
+ ```python
39
+ import librosa
40
+ import torch
41
+ import l3ac
42
+
43
+ all_models = l3ac.list_models()
44
+ print(f"Available models: {all_models}")
45
+
46
+ MODEL_USED = '1kbps'
47
+ codec = l3ac.get_model(MODEL_USED)
48
+ print(f"loaded codec({MODEL_USED}) and codec sample rate: {codec.config.sample_rate}")
49
+
50
+ sample_audio, sample_rate = librosa.load(librosa.example("libri1"))
51
+ sample_audio = sample_audio[None, :]
52
+ print(f"loaded sample audio and audio sample_rate :{sample_rate}")
53
+
54
+ sample_audio = librosa.resample(sample_audio, orig_sr=sample_rate, target_sr=codec.config.sample_rate)
55
+
56
+ codec.network.cuda()
57
+ codec.network.eval()
58
+ with torch.inference_mode():
59
+ audio_in = torch.tensor(sample_audio, dtype=torch.float32, device='cuda')
60
+ _, audio_length = audio_in.shape
61
+ print(f"{audio_in.shape=}")
62
+ q_feature, indices = codec.encode_audio(audio_in)
63
+ audio_out = codec.decode_audio(q_feature) # or
64
+ # audio_out = codec.decode_audio(indices=indices['indices'])
65
+ generated_audio = audio_out[:, :audio_length].detach().cpu().numpy()
66
+
67
+ mse = ((sample_audio - generated_audio) ** 2).mean().item()
68
+ print(f"codec({MODEL_USED}) mse: {mse}")
69
+ ```
70
+
71
+ ### Available Models
72
+
73
+ | config_name | Sample rate(Hz) | tokens/s | Codebook size | Bitrate(bps) |
74
+ |-------------|-----------------|----------|---------------|--------------|
75
+ | 0k75bps | 16,000 | 44.44 | 117,649 | 748.6 |
76
+ | 1kbps | 16,000 | 59.26 | 117,649 | 998.2 |
77
+ | 1k5bps | 16,000 | 88.89 | 117,649 | 1497.3 |
78
+ | 3kbps | 16,000 | 166.67 | 250,047 | 2988.6 |