Update README.md
Browse files
README.md
CHANGED
|
@@ -50,7 +50,40 @@ For more details please check [our paper](https://arxiv.org/abs/2409.12117).
|
|
| 50 |
The model is available for use in the [NVIDIA NeMo](https://github.com/NVIDIA/NeMo), and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
|
| 51 |
|
| 52 |
### Inference
|
| 53 |
-
For inference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
### Training
|
| 56 |
For fine-tuning on another dataset please follow the steps available at our [Audio Codec Training Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Training.ipynb). Note that you will need to set the ```CONFIG_FILENAME``` parameter to the "audio_codec_low_frame_rate_22050.yaml" config. You also will need to set ```pretrained_model_name``` to "audio_codec_low_frame_rate_22khz".
|
|
|
|
| 50 |
The model is available for use in the [NVIDIA NeMo](https://github.com/NVIDIA/NeMo), and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
|
| 51 |
|
| 52 |
### Inference
|
| 53 |
+
For inference, you can follow our [Audio Codec Inference Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Inference.ipynb) which automatically downloads the model checkpoint. Note that you will need to set the ```model_name``` parameter to "audio_codec_low_frame_rate_22khz".
|
| 54 |
+
|
| 55 |
+
Alternatively, you can manually download the [checkpoint]() and use the code below to do an inference on the model:
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
```
|
| 59 |
+
import librosa
|
| 60 |
+
import torch
|
| 61 |
+
import soundfile as sf
|
| 62 |
+
from nemo.collections.tts.models import AudioCodecModel
|
| 63 |
+
|
| 64 |
+
codec_path = ??? # set here the model .nemo checkpoint path
|
| 65 |
+
path_to_input_audio = ??? # path of the input audio
|
| 66 |
+
path_to_output_audio = ??? # path of the reconstructed output audio
|
| 67 |
+
|
| 68 |
+
nemo_codec_model = AudioCodecModel.restore_from(restore_path=codec_path, map_location="cpu").eval()
|
| 69 |
+
|
| 70 |
+
# get discrete tokens from audio
|
| 71 |
+
audio, _ = librosa.load(path_to_input_audio, sr=nemo_codec_model.sample_rate)
|
| 72 |
+
|
| 73 |
+
device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
| 74 |
+
audio_tensor = torch.from_numpy(audio).unsqueeze(dim=0).to(device)
|
| 75 |
+
audio_len = torch.tensor([audio_tensor[0].shape[0]]).to(device)
|
| 76 |
+
|
| 77 |
+
encoded_tokens, encoded_len = nemo_codec_model.encode(audio=audio_tensor, audio_len=audio_len)
|
| 78 |
+
|
| 79 |
+
# Reconstruct audio from tokens
|
| 80 |
+
reconstructed_audio, _ = nemo_codec_model.decode(tokens=encoded_tokens, tokens_len=encoded_len)
|
| 81 |
+
|
| 82 |
+
# save reconstructed audio
|
| 83 |
+
output_audio = reconstructed_audio.cpu().numpy().squeeze()
|
| 84 |
+
sf.write(path_to_output_audio, output_audio, nemo_codec_model.sample_rate)
|
| 85 |
+
|
| 86 |
+
```
|
| 87 |
|
| 88 |
### Training
|
| 89 |
For fine-tuning on another dataset please follow the steps available at our [Audio Codec Training Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Training.ipynb). Note that you will need to set the ```CONFIG_FILENAME``` parameter to the "audio_codec_low_frame_rate_22050.yaml" config. You also will need to set ```pretrained_model_name``` to "audio_codec_low_frame_rate_22khz".
|