ArthurZ
/

encodec_24khz

Feature Extraction

Model card Files Files and versions

sanchit-gandhi commited on Jun 15, 2023

Commit

0a7fb6e

·

1 Parent(s): ec73fe3

Update README.md

Files changed (1) hide show

README.md +30 -12

README.md CHANGED Viewed

@@ -51,23 +51,40 @@ music generation, or text to speech tasks.
 ## How to Get Started with the Model
-Use the following code to get started with the EnCodec model:
 ```python
-import torch
-from encodec import EnCodecModel
-# Load the pre-trained EnCodec model
-model = EnCodecModel()
-# Load the audio data
-audio_data = torch.load('audio.pt')
-# Compress the audio
-audio_codes = model.encode(audio_data)[0]
-# Decompress the audio
-reconstructed_audio = model.decode(audio_codes)
 ```
 ## Training Details
@@ -142,6 +159,7 @@ quality, particularly in applications where low latency is not critical (e.g., m
 **BibTeX:**
 @misc{défossez2022high,
       title={High Fidelity Neural Audio Compression},
       author={Alexandre Défossez and Jade Copet and Gabriel Synnaeve and Yossi Adi},
@@ -150,4 +168,4 @@ quality, particularly in applications where low latency is not critical (e.g., m
       archivePrefix={arXiv},
       primaryClass={eess.AS}
 }

 ## How to Get Started with the Model
+Use the following code to get started with the EnCodec model using a dummy example from the LibriSpeech dataset (~9MB). First, install the required Python packages:
+```
+pip install --upgrade pip
+pip install --upgrade transformers datasets[audio]
+```
+Then load an audio sample, and run a forward pass of the model:
 ```python
+from datasets import load_dataset, Audio
+from transformers import EncodecModel, AutoProcessor
+# load a demonstration datasets
+librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
+# load the model + processor (for pre-processing the audio)
+model = EncodecModel.from_pretrained("facebook/encodec_24khz")
+processor = AutoProcessor.from_pretrained("facebook/encodec_24khz")
+# cast the audio data to the correct sampling rate for the model
+librispeech_dummy = librispeech_dummy.cast_column("audio", Audio(sampling_rate=processor.sampling_rate))
+audio_sample = librispeech_dummy[0]["audio"]["array"]
+# pre-process the inputs
+inputs = processor(raw_audio=audio_sample, sampling_rate=processor.sampling_rate, return_tensors="pt")
+# explicitly encode then decode the audio inputs
+encoder_outputs = model.encode(inputs["input_values"], inputs["padding_mask"])
+audio_values = model.decode(encoder_outputs.audio_codes, encoder_outputs.audio_scales, inputs["padding_mask"])[0]
+# or the equivalent with a forward pass
+audio_values = model(inputs["input_values"], inputs["padding_mask"]).audio_values
 ```
 ## Training Details
 **BibTeX:**
+```
 @misc{défossez2022high,
       title={High Fidelity Neural Audio Compression},
       author={Alexandre Défossez and Jade Copet and Gabriel Synnaeve and Yossi Adi},
       archivePrefix={arXiv},
       primaryClass={eess.AS}
 }
+```