reconstruction test not work as expected
#11
by infilify - opened
Hi, thank you for the great work.
Unfotunately, the reconstruction test I tried produced incorrect output.
The test code I used was taken from https://huggingface.co/HKUSTAudio/xcodec2, and following is my steps, could you please shine some light on me.
ffmpeg -i test.flac -ar 16000 -c:a pcm_s16le test.wavconvert thetest.flacto 16hz wav file
ffprobe test.wav
Input #0, wav, from 'test.wav':
Metadata:
encoder : Lavf59.27.100
Duration: 00:00:04.91, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
- run with the following test code
import torch
import soundfile as sf
from transformers import AutoConfig
import sys
import os
# Add the model path to system path to make the xcodec2 module importable
model_path = "models/HKUSTAudio/xcodec2"
sys.path.append(os.path.abspath(model_path))
# Now import from the module
# from modeling_xcodec2 import XCodec2Model
from models.HKUSTAudio.xcodec2.modeling_xcodec2 import XCodec2Model
model = XCodec2Model.from_pretrained(model_path)
model.eval().cuda()
# wav, sr = sf.read("sample.wav")
# wav, sr = sf.read("sample-short3.wav")
wav, sr = sf.read("test.wav")
wav_tensor = torch.from_numpy(wav).float().unsqueeze(0) # Shape: (1, T)
with torch.no_grad():
# Only 16khz speech
# Only supports single input. For batch inference, please refer to the link below.
vq_code = model.encode_code(input_waveform=wav_tensor)
print("Code:", vq_code )
recon_wav = model.decode_code(vq_code).cpu() # Shape: (1, 1, T')
sf.write("output/reconstructed.wav", recon_wav[0, 0, :].numpy(), sr)
print("Done! Check reconstructed.wav")
Attached is the output/reconstructed.wav file. It's noisy and incorrect.
Many thanks
solved by reinstall xcodec in conda env
infilify changed discussion status to closed