HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Paper • 2010.05646 • Published
Training: Trained from scratch (random initialization) - NOT fine-tuned from pretrained
HiFi-GAN vocoder trained from scratch on voice recordings for text-to-speech synthesis.
import torch
import yaml
# Load config
with open('config.json') as f:
config = yaml.safe_load(f)
# Load generator
generator = Generator(config).cuda()
state_dict = torch.load('generator_best.pt')
generator.load_state_dict(state_dict)
generator.eval()
# Generate audio
with torch.no_grad():
audio = generator(mel_spectrogram)
Voice recordings processed to 22050 Hz mono 15-second chunks.
@article{kong2020hifigan,
title={HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis},
author={Kong, Jungil and Kim, Jaehyeon and Bae, Jack},
journal={arXiv preprint arXiv:2010.05646},
year={2020}
}