Commit
·
0a7fb6e
1
Parent(s):
ec73fe3
Update README.md
Browse files
README.md
CHANGED
|
@@ -51,23 +51,40 @@ music generation, or text to speech tasks.
|
|
| 51 |
|
| 52 |
## How to Get Started with the Model
|
| 53 |
|
| 54 |
-
Use the following code to get started with the EnCodec model:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
```python
|
| 57 |
-
import
|
| 58 |
-
from
|
|
|
|
| 59 |
|
| 60 |
-
#
|
| 61 |
-
|
| 62 |
|
| 63 |
-
#
|
| 64 |
-
|
|
|
|
| 65 |
|
| 66 |
-
#
|
| 67 |
-
|
|
|
|
| 68 |
|
| 69 |
-
#
|
| 70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
```
|
| 72 |
|
| 73 |
## Training Details
|
|
@@ -142,6 +159,7 @@ quality, particularly in applications where low latency is not critical (e.g., m
|
|
| 142 |
|
| 143 |
**BibTeX:**
|
| 144 |
|
|
|
|
| 145 |
@misc{défossez2022high,
|
| 146 |
title={High Fidelity Neural Audio Compression},
|
| 147 |
author={Alexandre Défossez and Jade Copet and Gabriel Synnaeve and Yossi Adi},
|
|
@@ -150,4 +168,4 @@ quality, particularly in applications where low latency is not critical (e.g., m
|
|
| 150 |
archivePrefix={arXiv},
|
| 151 |
primaryClass={eess.AS}
|
| 152 |
}
|
| 153 |
-
|
|
|
|
| 51 |
|
| 52 |
## How to Get Started with the Model
|
| 53 |
|
| 54 |
+
Use the following code to get started with the EnCodec model using a dummy example from the LibriSpeech dataset (~9MB). First, install the required Python packages:
|
| 55 |
+
|
| 56 |
+
```
|
| 57 |
+
pip install --upgrade pip
|
| 58 |
+
pip install --upgrade transformers datasets[audio]
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
Then load an audio sample, and run a forward pass of the model:
|
| 62 |
|
| 63 |
```python
|
| 64 |
+
from datasets import load_dataset, Audio
|
| 65 |
+
from transformers import EncodecModel, AutoProcessor
|
| 66 |
+
|
| 67 |
|
| 68 |
+
# load a demonstration datasets
|
| 69 |
+
librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
| 70 |
|
| 71 |
+
# load the model + processor (for pre-processing the audio)
|
| 72 |
+
model = EncodecModel.from_pretrained("facebook/encodec_24khz")
|
| 73 |
+
processor = AutoProcessor.from_pretrained("facebook/encodec_24khz")
|
| 74 |
|
| 75 |
+
# cast the audio data to the correct sampling rate for the model
|
| 76 |
+
librispeech_dummy = librispeech_dummy.cast_column("audio", Audio(sampling_rate=processor.sampling_rate))
|
| 77 |
+
audio_sample = librispeech_dummy[0]["audio"]["array"]
|
| 78 |
|
| 79 |
+
# pre-process the inputs
|
| 80 |
+
inputs = processor(raw_audio=audio_sample, sampling_rate=processor.sampling_rate, return_tensors="pt")
|
| 81 |
+
|
| 82 |
+
# explicitly encode then decode the audio inputs
|
| 83 |
+
encoder_outputs = model.encode(inputs["input_values"], inputs["padding_mask"])
|
| 84 |
+
audio_values = model.decode(encoder_outputs.audio_codes, encoder_outputs.audio_scales, inputs["padding_mask"])[0]
|
| 85 |
+
|
| 86 |
+
# or the equivalent with a forward pass
|
| 87 |
+
audio_values = model(inputs["input_values"], inputs["padding_mask"]).audio_values
|
| 88 |
```
|
| 89 |
|
| 90 |
## Training Details
|
|
|
|
| 159 |
|
| 160 |
**BibTeX:**
|
| 161 |
|
| 162 |
+
```
|
| 163 |
@misc{défossez2022high,
|
| 164 |
title={High Fidelity Neural Audio Compression},
|
| 165 |
author={Alexandre Défossez and Jade Copet and Gabriel Synnaeve and Yossi Adi},
|
|
|
|
| 168 |
archivePrefix={arXiv},
|
| 169 |
primaryClass={eess.AS}
|
| 170 |
}
|
| 171 |
+
```
|