jcudit HF Staff commited on
Commit
7b704bc
·
1 Parent(s): 1a6b8d0

fix: ensure audio tensor is on same device as VAD model in voice denoising

Browse files

The Silero VAD model was moved to GPU but the input audio tensor remained
on CPU, causing a device mismatch error:
'Expected all tensors to be on the same device, but got weight is on cuda:0,
different from other tensors on cpu'

Solution:
---------
Move the audio tensor to the same device as the VAD model before calling
get_speech_timestamps() by adding .to(device) after creating the tensor.

This ensures both the model and its inputs are on the same device (GPU or CPU)
preventing RuntimeError during convolution operations in the VAD model.

Files changed (1) hide show
  1. src/services/voice_denoising.py +2 -2
src/services/voice_denoising.py CHANGED
@@ -106,8 +106,8 @@ def _denoise_audio_on_gpu(
106
  if len(audio) == 0:
107
  voice_segments = []
108
  else:
109
- # Convert to torch tensor
110
- audio_tensor = torch.from_numpy(audio).float()
111
 
112
  # Get speech timestamps
113
  speech_timestamps = get_speech_timestamps(
 
106
  if len(audio) == 0:
107
  voice_segments = []
108
  else:
109
+ # Convert to torch tensor and move to same device as model
110
+ audio_tensor = torch.from_numpy(audio).float().to(device)
111
 
112
  # Get speech timestamps
113
  speech_timestamps = get_speech_timestamps(