nvidia
/

low-frame-rate-speech-codec-22khz

Feature Extraction

Model card Files Files and versions

CasanovaE commited on Dec 6, 2024

Commit

2f151d0

·

verified ·

1 Parent(s): bcf28fc

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -20,7 +20,9 @@ padding: 0;
 The [Low Frame-rate Speech Codec](https://arxiv.org/abs/2409.12117) is a neural audio codec that leverages finite scalar quantization and adversarial training with large speech language models to achieve high-quality audio compression with a 1.89 kbps bitrate and 21.5 frames per second.
 ## Model Architecture
 Low Frame-rate Speech Codec model is composed of a fully convolutional generator neural network and three discriminators.

 The [Low Frame-rate Speech Codec](https://arxiv.org/abs/2409.12117) is a neural audio codec that leverages finite scalar quantization and adversarial training with large speech language models to achieve high-quality audio compression with a 1.89 kbps bitrate and 21.5 frames per second.
+| Sample Rate | Frame Rate | Bit Rate   | # Codebooks | Codebook Size | Embed Dim   | FSQ Levels   |
+|:-----------:|:----------:|:----------:|:-----------:|:-------------:|:-----------:|:------------:|
+| 22050       | 21.5       | 1.89kpbs    | 8           | 2016          | 32          | [8, 7, 6, 6] |
 ## Model Architecture
 Low Frame-rate Speech Codec model is composed of a fully convolutional generator neural network and three discriminators.