File size: 4,724 Bytes
4040b08 48f5914 ad45577 71ed17f 4040b08 a1d5a68 b1c8839 63fc4f0 f8ae33c 6a25168 ae44ce7 6a25168 3951917 ae44ce7 db065f1 9d6feb8 01638a4 ae44ce7 3951917 db065f1 ae44ce7 c302c36 18f9bf2 71ed17f eb6acfe 16f4f68 46378ad 0ffc2f0 eb6acfe 16f4f68 46378ad eb6acfe 16f4f68 70422b4 46378ad 85f9d3f 46378ad 0ffc2f0 0c97cea 0ffc2f0 0c97cea 0ffc2f0 0c97cea 0ffc2f0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | ---
license: apache-2.0
library_name: diffusers
pipeline_tag: text-to-image
datasets:
- opendiffusionai/laion2b-squareish-1536px
thumbnail: https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/resolve/main/examples/side_by_side_b.png
base_model: jimmycarter/LibreFLUX
---
# LibreFLUX-ControlNet

# Update - 4/10/2026
- Retrained this model on [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px)
- I tripled the control layers, to get better guidance
# Fun Facts
- Trained exclusively on images generated by [Segment Anything (SAM)](https://aidemos.meta.com/segment-anything/)
- Uses SAM style images as input, outputs photorealistic images
- Trained at 1024x1024 resolution, inference works best at 1.5k and up
- Trained on 320K segmented images from [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px)
- Base model is [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX) ( de-distilled FLUX )
# Showcases
<table style="width:100%; table-layout:fixed;">
<tr>
<td><img src="./examples/resized_kitten_seg.png" ></td>
<td><img src="./examples/resized_kitten.png" ></td>
</tr>
<tr>
<td><img src="./examples/resized_dread_girl_seg.png" ></td>
<td><img src="./examples/resized_dread_girl.png" ></td>
</tr>
<tr>
<td><img src="./examples/resized_house_seg.png" ></td>
<td><img src="./examples/resized_house.png" ></td>
</tr>
</table>
# Extra Details
- I built this repo to train the model: [https://github.com/NeuralVFX/LibreFLUX-ControlNet](https://github.com/NeuralVFX/LibreFLUX-ControlNet)
- Trained in same non-distilled fashion as [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX)
- Uses Attention Masking
- Uses CFG during Inference ( allows negative prompting )
- Inference code roughly adapted from: [https://github.com/bghira/SimpleTuner](https://github.com/bghira/SimpleTuner)
# ComfyUI
- I've made some custom nodes for this: [https://github.com/NeuralVFX/LibreFLUX-ComfyUI](https://github.com/NeuralVFX/LibreFLUX-ComfyUI)
# Compatibility
```py
pip install -U diffusers==0.32.0
pip install -U "transformers @ git+https://github.com/huggingface/transformers@e15687fffe5c9d20598a19aeab721ae0a7580f8a"
```
Low VRAM:
```py
pip install optimum-quanto
```
# Load Pipeline
```py
import torch
from diffusers import DiffusionPipeline
model_id = "neuralvfx/LibreFlux-ControlNet"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32
pipe = DiffusionPipeline.from_pretrained(
model_id,
custom_pipeline=model_id,
trust_remote_code=True,
torch_dtype=dtype,
safety_checker=None
).to(device)
```
# Inference
```py
from PIL import Image
from torchvision.transforms import ToTensor
# Load Control Image
cond = Image.open("examples/libre_flux_control_image.png")
cond = cond.resize((1024, 1024))
# Convert PIL image to tensor and move to device with correct dtype
cond_tensor = ToTensor()(cond)[:3,:,:].to(pipe.device, dtype=pipe.dtype).unsqueeze(0)
out = pipe(
prompt="many pieces of drift wood spelling libre flux sitting casting shadow on the lumpy sandy beach with foot prints all over it",
negative_prompt="blurry",
control_image=cond_tensor, # Use the tensor here
num_inference_steps=75,
guidance_scale=4.0,
height =1024,
width=1024,
controlnet_conditioning_scale=1.0,
num_images_per_prompt=1,
control_mode=None,
generator= torch.Generator().manual_seed(32),
return_dict=True,
)
out.images[0]
```
# Load Pipeline ( Low VRAM )
```py
import torch
from diffusers import DiffusionPipeline
from optimum.quanto import freeze, quantize, qint8
model_id = "neuralvfx/LibreFlux-ControlNet"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32
pipe = DiffusionPipeline.from_pretrained(
model_id,
custom_pipeline=model_id,
trust_remote_code=True,
torch_dtype=dtype,
safety_checker=None
)
quantize(
pipe.transformer,
weights=qint8,
exclude=[
"*.norm", "*.norm1", "*.norm2", "*.norm2_context",
"proj_out", "x_embedder", "norm_out", "context_embedder",
],
)
quantize(
pipe.controlnet,
weights=qint8,
exclude=[
"*.norm", "*.norm1", "*.norm2", "*.norm2_context",
"proj_out", "x_embedder", "norm_out", "context_embedder",
],
)
freeze(pipe.transformer)
freeze(pipe.controlnet)
pipe.enable_model_cpu_offload()
``` |