Audio-to-Audio
PyTorch
Safetensors
xcodec2

reconstruction test not work as expected

#11
by infilify - opened

Hi, thank you for the great work.

Unfotunately, the reconstruction test I tried produced incorrect output.

The test code I used was taken from https://huggingface.co/HKUSTAudio/xcodec2, and following is my steps, could you please shine some light on me.

  1. ffmpeg -i test.flac -ar 16000 -c:a pcm_s16le test.wav convert the test.flac to 16hz wav file
ffprobe test.wav
Input #0, wav, from 'test.wav':
  Metadata:
    encoder         : Lavf59.27.100
  Duration: 00:00:04.91, bitrate: 256 kb/s
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
  1. run with the following test code
import torch
import soundfile as sf
from transformers import AutoConfig
import sys
import os

# Add the model path to system path to make the xcodec2 module importable
model_path = "models/HKUSTAudio/xcodec2"
sys.path.append(os.path.abspath(model_path))

# Now import from the module
# from modeling_xcodec2 import XCodec2Model
from models.HKUSTAudio.xcodec2.modeling_xcodec2 import XCodec2Model

model = XCodec2Model.from_pretrained(model_path)
model.eval().cuda()

# wav, sr = sf.read("sample.wav")
# wav, sr = sf.read("sample-short3.wav")
wav, sr = sf.read("test.wav")
wav_tensor = torch.from_numpy(wav).float().unsqueeze(0)  # Shape: (1, T)

with torch.no_grad():
   # Only 16khz speech
   # Only supports single input. For batch inference, please refer to the link below.
    vq_code = model.encode_code(input_waveform=wav_tensor)
    print("Code:", vq_code )

    recon_wav = model.decode_code(vq_code).cpu()       # Shape: (1, 1, T')

sf.write("output/reconstructed.wav", recon_wav[0, 0, :].numpy(), sr)
print("Done! Check reconstructed.wav")

Attached is the output/reconstructed.wav file. It's noisy and incorrect.

Many thanks

solved by reinstall xcodec in conda env

infilify changed discussion status to closed

Sign up or log in to comment