Instructions to use black-forest-labs/FLUX.1-dev with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use black-forest-labs/FLUX.1-dev with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
[Bug] FLUXClipModel persistent CUDA memory (4777 MB reserved on RTX 5060 Ti, ComfyUI)
Hi team,
I'm reporting a reproducible persistent GPU memory issue with the FLUX.1-dev model when running inside ComfyUI on Windows.
Environment
GPU: RTX 5060 Ti (16 GB VRAM, Ada Lovelace, sm 12.0)
Driver: 555.xx series
Torch: 2.7.0 + cu128
xFormers: 0.0.30
ComfyUI: 0.3.66
Model: black-forest-labs/FLUX.1-dev
Problem
First generation works normally.
On the second generation, ComfyUI hangs in FluxClipModel or KSampler.
VRAM stays fixed at 4777 MB reserved even after all tensors are deleted and torch.cuda.empty_cache() / gc.collect() / driver reset are called.
This appears to be a persistent CUDA context or asynchronous memory pool that never releases between runs.
Observations
Forcing torch.cuda._lazy_init() or cudaDeviceReset() inside Python doesn’t help.
Restarting the ComfyUI process immediately clears the stuck VRAM block.
Happens consistently on RTX 4060/5060-series cards with CUDA 12.8+, but not on 4090.
Using environment vars
CUDA_LAUNCH_BLOCKING=1
TORCH_CUDNN_V8_API_ENABLED=0
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:False
reduces the probability but doesn’t fully solve it.
Request
Could you please confirm:
Whether this persistent kernel/memory pool behavior is known for FLUX models?
If it’s planned to be fixed in FLUX 1.1 or later?
Thanks for your amazing work — FLUX generates beautiful results, but this memory lock makes multi-prompt workflows impossible on mid-range GPUs.
Best regards,
Yonqion (ComfyUI user)