[solved] CUDA OOM with RTX 5060 Ti 16G

#5
by foxedge - opened

Hi, the model card page mentions this model can be run on 16G VRAM.

However, I tested the demo code with a RTX 5060 Ti 16G, and I get a CUDA OOM.

I am using :
diffusers 0.35.2
pytorch 2.7.1
bitsandbytes 0.48.2
transformers 4.57.1

The output shows:

You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug.
The config attributes {'pooled_projection_dim': 768} were passed to QwenImageTransformer2DModel, but are not expected and will be ignored. Please verify your config.json configuration file.

OutOfMemoryError                          Traceback (most recent call last)
Cell In[1], line 15
     12     torch_dtype = torch.float32
     13     device = "cpu"
---> 15 pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
     16 pipe = pipe.to(device)
     18 positive_magic = {
     19     "en": ", Ultra HD, 4K, cinematic composition.", # for english prompt
     20     "zh": ", 超清,4K,电影级构图." # for chinese prompt
     21 }
[...]

Any suggestion to fix this? I am not sure if this is a setup issue or if this model just needs more VRAM. 😅

Thanks!

you can look at his example https://huggingface.co/ovedrive/qwen-image-edit-4bit/discussions/4#68ae6605af245e5fd682489c
it worked with 16GB. It's just about picking what to offload. 16GB is cutting it close without then getting into blockswapping for lower VRAM.

Thanks @ovedrive for the pointer and @aahila 's for the detailed comment, it worked perfectly! 😁 Peak VRAM usage was ~12.8GB, and each inference step took about 6s.

foxedge changed discussion title from CUDA OOM with RTX 5060 Ti 16G to [solved] CUDA OOM with RTX 5060 Ti 16G

fantastic. I am glad it helped.

Sign up or log in to comment