metadata
license: apache-2.0
datasets:
- amphion/Emilia-Dataset
language:
- de
- en
base_model:
- neuphonic/neucodec
tags:
- audio
- speech
NeuCodec decoder fine-tuned for German speech
This is just the decoder of neuphonic/neucodec, fine-tuned on equal amounts of German and English speech data from Emilia-Yodas, to enhance decoding quality of German speech. Since we only fine-tuned the decoder, the codebook is identical to the base model, meaning this model can be used with the regular NeuCodec encoder.
We supply a compact class NeuCodecDecoder.py to easily run inference with this decoder since the NeuCodec codebase doesn't easily allow loading model files from foreign HuggingFace repos.
Inference Example
import torch
import torchaudio
from NeuCodecDecoder import NeuCodecDecoder
decoder_model = NeuCodecDecoder.from_pretrained("DigitalLearningGmbH/neucodec-decoder-ft-de")
decoder_model = decoder_model.eval().cuda()
with torch.no_grad():
decoded = decoder_model.decode_code(torch.tensor(tokens).unsqueeze(0).unsqueeze(0).to('cuda')).cpu()
torchaudio.save("decoded.wav", decoded[0, :, :], 24_000)
For more information please refer to the original model card.