OOM on 4 GPU

#3
by SpiridonSunRotator - opened

Hi, I am trying to produce an image on 4 GPUs with auto device_map. However the code crashes due to CUDA OOM even for small inputs, say 64x64

  cot_text, samples = model.generate_image(
        prompt=prompt,
        image=imgs_input,
        seed=42,
        image_size=(64, 64),
        use_system_prompt="en_unified",
        bot_task="think_recaption",  # Use "think_recaption" for reasoning and enhancement
        infer_align_image_size=True,  # Align output image size to input image size
        diff_infer_steps=8, 
        verbose=2
    )

According to the error message, it seems that a large tensor is allocated somewhere

File /usr/local/lib/python3.12/dist-packages/torch/nn/modules/conv.py:712, in Conv3d._conv_forward(self, input, weight, bias)
    700 if self.padding_mode != "zeros":
    701     return F.conv3d(
    702         F.pad(
    703             input, self._reversed_padding_repeated_twice, mode=self.padding_mode
   (...)    710         self.groups,
    711     )
--> 712 return F.conv3d(
    713     input, weight, bias, self.stride, self.padding, self.dilation, self.groups
    714 )

OutOfMemoryError: CUDA out of memory. Tried to allocate 27.00 GiB. GPU 0 has a total capacity of 79.25 GiB of which 11.26 GiB is free. Process 334550 has 67.98 GiB memory in use. Of the allocated memory 45.48 GiB is allocated by PyTorch, and 22.01 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

How can one fix this?

Sign up or log in to comment