Works on Intel Mac but not M4 Mac

#22
by 9SL9 - opened

I have two computers: M4 Mac and iMac (intel). I was able to get this working on intel but not M4 Apple Silicon.

Identical commands were used on both to establish environments:
python3.10 -m venv IndicF5
pip install git+https://github.com/ai4bharat/IndicF5.git
python3.10 -m pip install --upgrade pip
pip install transformers==4.49.0 pydub soundfile safetensors huggingface_hub

Works on my Intel Mac, but I get the following error on my M4

/IndicF5/lib/python3.10/site-packages/torchaudio/_backend/utils.py:213: UserWarning: In 2.9, this function's implementation will be changed to use torchaudio.load_with_torchcodec` under the hood. Some parameters like normalize, format, buffer_size, and backend will be ignored. We recommend that you port your code to rely directly on TorchCodec's decoder instead: https://docs.pytorch.org/torchcodec/stable/generated/torchcodec.decoders.AudioDecoder.html#torchcodec.decoders.AudioDecoder.
warnings.warn(
Traceback (most recent call last):
File "/IndicF5/IndicF5/test_tts.py", line 10, in
audio = model(
File "/IndicF5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/IndicF5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "/huggingface/modules/transformers_modules/ai4bharat/IndicF5/b82d286220e3070e171f4ef4b4bd047b9a447c9a/model.py", line 93, in forward
audio, final_sample_rate, _ = infer_process(
File "/IndicF5/IndicF5/f5_tts/infer/utils_infer.py", line 383, in infer_process
return infer_batch_process(
File "/IndicF5/IndicF5/f5_tts/infer/utils_infer.py", line 471, in infer_batch_process
generated_wave = vocoder.decode(generated_mel_spec)
File "/IndicF5/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/IndicF5/lib/python3.10/site-packages/vocos/pretrained.py", line 113, in decode
audio_output = self.head(x)
File "/IndicF5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/IndicF5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "/IndicF5/lib/python3.10/site-packages/vocos/heads.py", line 68, in forward
audio = self.istft(S)
File "/IndicF5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/IndicF5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "/IndicF5/lib/python3.10/site-packages/vocos/spectral_ops.py", line 46, in forward
return torch.istft(spec, self.n_fft, self.hop_length, self.win_length, self.window, center=True)
RuntimeError: istft(CPUComplexFloatType[1, 513, 579], n_fft=1024, hop_length=256, win_length=1024, window=torch.FloatTensor{[1024]}, center=1, normalized=0, onesided=None, length=None, return_complex=0) window overlap add min: 1
[ CPUBoolType{} ]

Sign up or log in to comment