ryota-komatsu/SylReg-Decoder
Viewer • Updated • 1.5M • 2.21k
How to use ryota-komatsu/bigvgan with Transformers:
# Load model directly
from transformers import BigVGAN
model = BigVGAN.from_pretrained("ryota-komatsu/bigvgan", dtype="auto")Use the code below to get started with the model.
git clone https://github.com/ryota-komatsu/speaker_disentangled_hubert.git
cd speaker_disentangled_hubert
sudo apt install git-lfs # for UTMOS
conda create -y -n py310 -c pytorch -c nvidia -c conda-forge python=3.10.19 pip=24.0 faiss-gpu=1.12.0
conda activate py310
pip install -r requirements/requirements.txt
sh scripts/setup.sh
import torchaudio
from src.bigvgan.bigvgan import BigVGan
from src.bigvgan.data import mel_spectrogram
wav_path = "/path/to/wav"
model = BigVGan.from_pretrained("ryota-komatsu/bigvgan", device_map="cuda")
# load a waveform
waveform, sr = torchaudio.load(wav_path)
waveform = torchaudio.functional.resample(waveform, sr, 16000)
waveform = waveform.cuda()
spectrogram = mel_spectrogram(waveform)
spectrogram = spectrogram.transpose(1, 2)
audio_values = model(spectrogram)
16 kHz-downsampled LibriTTS-R train set