teochew SpeechSynthesis
Collection
潮汕话-语音合成-模型-数据-工具 • 6 items • Updated
本模型是基于 HiFi-GAN 架构训练的潮州话语音合成声码器,用于从梅尔频谱重建高质量音频波形。模型基于 NVIDIA DeepLearningExamples 的训练代码从头开始训练。
(This is a HiFi-GAN vocoder trained for Teochew speech synthesis, designed to reconstruct high-quality audio waveforms from mel-spectrograms. The model was trained from scratch using NVIDIA's DeepLearningExamples training code.)
硬件环境 (Hardware):
超参数 (Hyperparameters):
Epochs: 400
Learning Rate: 0.0003
Learning Rate Decay: 0.9998
训练时长 (Training Duration): 约 8 天 (~8 days)
生成器配置 (Generator Config):
{
"upsample_rates": [8, 8, 2, 2],
"upsample_kernel_sizes": [16, 16, 4, 4],
"upsample_initial_channel": 512,
"resblock": "1",
"resblock_kernel_sizes": [3, 7, 11],
"resblock_dilation_sizes": [[1, 3, 5], [1, 3, 5], [1, 3, 5]]
}
梅尔频谱配置 (Mel-Spectrogram Config):
{
"sampling_rate": 22050,
"filter_length": 1024,
"num_mels": 80,
"hop_length": 256,
"win_length": 1024,
"mel_fmin": 0.0,
"mel_fmax": 11025.0,
"max_wav_value": 32768.0
}
cd hifigan_standalone
python hifigan_api.py --checkpoint path/to/ckpt.pt --input audio.wav --output out.wav
# 将 hifigan_standalone 所在目录加入 PYTHONPATH,或者放到项目中
# Add hifigan_standalone directory to PYTHONPATH or place it in your project
from hifigan_standalone import HiFiGANVocoder
vocoder = HiFiGANVocoder("path/to/checkpoint.pt")
vocoder.reconstruct_wav("input.wav", "output.wav")
import gradio as gr
from hifigan_standalone import HiFiGANVocoder
vocoder = HiFiGANVocoder("ckpt.pt")
demo = gr.Interface(
fn=vocoder.gradio_reconstruct,
inputs=gr.Audio(),
outputs=gr.Audio()
)
demo.launch()
使用 DNSMOS 指标对不同检查点进行评估 (DNSMOS metrics for different checkpoints):
| Checkpoint | SIG | BAK | OVRL | Notes |
|---|---|---|---|---|
| Ground Truth | 3.4699 | 3.8607 | 3.1040 | - |
| Epoch 100 | 3.4416 | 3.7775 | 3.0382 | - |
| Epoch 210 | 3.4737 | 3.7889 | 3.0724 | Best |
| Epoch 300 | 3.4232 | 3.7777 | 3.0250 | Sounds better than the Epoch 210. |
| Epoch 400 | 3.3923 | 3.7841 | 3.0050 | 过拟合 (Overfitting) |
推荐使用 (Recommended): Checkpoint 210(Epoch 210) or Checkpoint 300(Epoch 300).