OOM on 4 GPU
#3
by
SpiridonSunRotator
- opened
Hi, I am trying to produce an image on 4 GPUs with auto device_map. However the code crashes due to CUDA OOM even for small inputs, say 64x64
cot_text, samples = model.generate_image(
prompt=prompt,
image=imgs_input,
seed=42,
image_size=(64, 64),
use_system_prompt="en_unified",
bot_task="think_recaption", # Use "think_recaption" for reasoning and enhancement
infer_align_image_size=True, # Align output image size to input image size
diff_infer_steps=8,
verbose=2
)
According to the error message, it seems that a large tensor is allocated somewhere
File /usr/local/lib/python3.12/dist-packages/torch/nn/modules/conv.py:712, in Conv3d._conv_forward(self, input, weight, bias)
700 if self.padding_mode != "zeros":
701 return F.conv3d(
702 F.pad(
703 input, self._reversed_padding_repeated_twice, mode=self.padding_mode
(...) 710 self.groups,
711 )
--> 712 return F.conv3d(
713 input, weight, bias, self.stride, self.padding, self.dilation, self.groups
714 )
OutOfMemoryError: CUDA out of memory. Tried to allocate 27.00 GiB. GPU 0 has a total capacity of 79.25 GiB of which 11.26 GiB is free. Process 334550 has 67.98 GiB memory in use. Of the allocated memory 45.48 GiB is allocated by PyTorch, and 22.01 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
How can one fix this?