cuda out of memory on 30m audio
#20
by
dominik-machacek
- opened
hi,
I'm getting CUDA OOM error when processing 30-minute audio.
The GPU is NVIDIA L40 with 46068MiB, it was similar with 94G one.
My usage script is this: https://github.com/sarapapi/hearing2translate/blob/4bbc37e0785ea60a740de3a613c11d0c602698b5/inference/sfm/canaryv2.py#L13
but this line edited -- batch_size didn't work, [speech] instead of speech didn't work.
transcriptions = model.transcribe([speech], source_lang=src, target_lang=tgt, timestsamps=True, batch_size=1)
The error is as follows:
Traceback (most recent call last):
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/infer.py", line 242, in <module>
infer(args)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/infer.py", line 204, in infer
output = generate(model, model_input).strip()
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/inference/sfm/canaryv2.py", line 13, in generate
transcriptions = model.transcribe([speech], source_lang=src, target_lang=tgt, timestsamps=True, batch_size=1)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/collections/asr/models/aed_multitask_models.py", line 581, in transcribe
results = super().transcribe(audio=audio, override_config=trcfg)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/collections/asr/parts/mixins/transcription.py", line 270, in transcribe
for processed_outputs in generator:
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/collections/asr/parts/mixins/transcription.py", line 369, in transcribe_generator
model_outputs = self._transcribe_forward(test_batch, transcribe_cfg)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/collections/asr/models/aed_multitask_models.py", line 958, in _transcribe_forward
log_probs, encoded_len, enc_states, enc_mask = self.forward(input_signal=audio, input_signal_length=audio_lens)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/core/classes/common.py", line 1204, in wrapped_call
outputs = wrapped(*args, **kwargs)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/collections/asr/models/aed_multitask_models.py", line 743, in forward
encoded, encoded_len = self.encoder(audio_signal=processed_signal, length=processed_signal_length)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/core/classes/common.py", line 1204, in wrapped_call
outputs = wrapped(*args, **kwargs)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/collections/asr/modules/conformer_encoder.py", line 584, in forward
return self.forward_internal(
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/collections/asr/modules/conformer_encoder.py", line 683, in forward_internal
audio_signal = layer(
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/collections/asr/parts/submodules/conformer_modules.py", line 181, in forward
x = self.self_attn(query=x, key=x, value=x, mask=att_mask, pos_emb=pos_emb, cache=cache_last_channel)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py", line 314, in forward
matrix_bd = self.rel_shift(matrix_bd)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py", line 266, in rel_shift
x = torch.nn.functional.pad(x, pad=(1, 0)) # (b, h, t1, t2+1)
File "/lnet/work/people/machacek/uedin/systems/hearing2translate/p3/lib/python3.10/site-packages/torch/nn/functional.py", line 5294, in pad
return torch._C._nn.pad(input, pad, mode, value)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.14 GiB. GPU 0 has a total capacity of 44.32 GiB of which 2.64 GiB is free. Process 224045 has 41.67 GiB memory in use. Of the allocated memory 41.11 GiB is allocated by PyTorch, and 59.95 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[W121 17:37:12.219515031 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
The weird thing is that through demo the same audio works: https://huggingface.co/spaces/nvidia/canary-1b-v2
I just found out that the demo uses H200 GPU and that has 141 G memory. Is it really needed? Is there another approach to process long audio with small GPU?
Thanks!