CasanovaE commited on
Commit
e9d1415
·
verified ·
1 Parent(s): d0c88b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -50,7 +50,40 @@ For more details please check [our paper](https://arxiv.org/abs/2409.12117).
50
  The model is available for use in the [NVIDIA NeMo](https://github.com/NVIDIA/NeMo), and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
51
 
52
  ### Inference
53
- For inference please follow our [Audio Codec Inference Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Inference.ipynb). Note that you will need to set the ```model_name``` parameter to "audio_codec_low_frame_rate_22khz".
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ### Training
56
  For fine-tuning on another dataset please follow the steps available at our [Audio Codec Training Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Training.ipynb). Note that you will need to set the ```CONFIG_FILENAME``` parameter to the "audio_codec_low_frame_rate_22050.yaml" config. You also will need to set ```pretrained_model_name``` to "audio_codec_low_frame_rate_22khz".
 
50
  The model is available for use in the [NVIDIA NeMo](https://github.com/NVIDIA/NeMo), and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
51
 
52
  ### Inference
53
+ For inference, you can follow our [Audio Codec Inference Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Inference.ipynb) which automatically downloads the model checkpoint. Note that you will need to set the ```model_name``` parameter to "audio_codec_low_frame_rate_22khz".
54
+
55
+ Alternatively, you can manually download the [checkpoint]() and use the code below to do an inference on the model:
56
+
57
+
58
+ ```
59
+ import librosa
60
+ import torch
61
+ import soundfile as sf
62
+ from nemo.collections.tts.models import AudioCodecModel
63
+
64
+ codec_path = ??? # set here the model .nemo checkpoint path
65
+ path_to_input_audio = ??? # path of the input audio
66
+ path_to_output_audio = ??? # path of the reconstructed output audio
67
+
68
+ nemo_codec_model = AudioCodecModel.restore_from(restore_path=codec_path, map_location="cpu").eval()
69
+
70
+ # get discrete tokens from audio
71
+ audio, _ = librosa.load(path_to_input_audio, sr=nemo_codec_model.sample_rate)
72
+
73
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
74
+ audio_tensor = torch.from_numpy(audio).unsqueeze(dim=0).to(device)
75
+ audio_len = torch.tensor([audio_tensor[0].shape[0]]).to(device)
76
+
77
+ encoded_tokens, encoded_len = nemo_codec_model.encode(audio=audio_tensor, audio_len=audio_len)
78
+
79
+ # Reconstruct audio from tokens
80
+ reconstructed_audio, _ = nemo_codec_model.decode(tokens=encoded_tokens, tokens_len=encoded_len)
81
+
82
+ # save reconstructed audio
83
+ output_audio = reconstructed_audio.cpu().numpy().squeeze()
84
+ sf.write(path_to_output_audio, output_audio, nemo_codec_model.sample_rate)
85
+
86
+ ```
87
 
88
  ### Training
89
  For fine-tuning on another dataset please follow the steps available at our [Audio Codec Training Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Training.ipynb). Note that you will need to set the ```CONFIG_FILENAME``` parameter to the "audio_codec_low_frame_rate_22050.yaml" config. You also will need to set ```pretrained_model_name``` to "audio_codec_low_frame_rate_22khz".