[Bug] FLUXClipModel persistent CUDA memory (4777 MB reserved on RTX 5060 Ti, ComfyUI)

#547

by yonqion - opened Oct 24, 2025

Hi team,

I'm reporting a reproducible persistent GPU memory issue with the FLUX.1-dev model when running inside ComfyUI on Windows.

Environment

GPU: RTX 5060 Ti (16 GB VRAM, Ada Lovelace, sm 12.0)

Driver: 555.xx series

Torch: 2.7.0 + cu128

xFormers: 0.0.30

ComfyUI: 0.3.66

Model: black-forest-labs/FLUX.1-dev

Problem

First generation works normally.

On the second generation, ComfyUI hangs in FluxClipModel or KSampler.

VRAM stays fixed at 4777 MB reserved even after all tensors are deleted and torch.cuda.empty_cache() / gc.collect() / driver reset are called.

This appears to be a persistent CUDA context or asynchronous memory pool that never releases between runs.

Observations

Forcing torch.cuda._lazy_init() or cudaDeviceReset() inside Python doesn’t help.

Restarting the ComfyUI process immediately clears the stuck VRAM block.

Happens consistently on RTX 4060/5060-series cards with CUDA 12.8+, but not on 4090.

Using environment vars

CUDA_LAUNCH_BLOCKING=1
TORCH_CUDNN_V8_API_ENABLED=0
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:False

reduces the probability but doesn’t fully solve it.

Request

Could you please confirm:

Whether this persistent kernel/memory pool behavior is known for FLUX models?

If it’s planned to be fixed in FLUX 1.1 or later?

Thanks for your amazing work — FLUX generates beautiful results, but this memory lock makes multi-prompt workflows impossible on mid-range GPUs.

Best regards,
Yonqion (ComfyUI user)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment