Text-to-Image
Diffusers
Safetensors
LibreFluxControlNetPipeline
File size: 4,724 Bytes
4040b08
 
 
 
 
48f5914
ad45577
71ed17f
4040b08
a1d5a68
b1c8839
63fc4f0
f8ae33c
 
 
 
6a25168
ae44ce7
6a25168
 
 
3951917
ae44ce7
db065f1
9d6feb8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01638a4
ae44ce7
 
3951917
db065f1
ae44ce7
 
 
c302c36
18f9bf2
71ed17f
eb6acfe
16f4f68
46378ad
 
 
0ffc2f0
 
 
 
eb6acfe
16f4f68
 
46378ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb6acfe
16f4f68
 
70422b4
 
46378ad
85f9d3f
46378ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ffc2f0
 
 
 
 
 
0c97cea
0ffc2f0
 
 
 
 
 
0c97cea
0ffc2f0
0c97cea
0ffc2f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
license: apache-2.0
library_name: diffusers
pipeline_tag: text-to-image
datasets:
- opendiffusionai/laion2b-squareish-1536px
thumbnail: https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/resolve/main/examples/side_by_side_b.png
base_model: jimmycarter/LibreFLUX
---
# LibreFLUX-ControlNet
![Example: Control image vs result](examples/side_by_side_b.png)

# Update - 4/10/2026
- Retrained this model on [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px)
- I tripled the control layers, to get better guidance

# Fun Facts
- Trained exclusively on images generated by [Segment Anything (SAM)](https://aidemos.meta.com/segment-anything/)
- Uses SAM style images as input, outputs photorealistic images
- Trained at 1024x1024 resolution, inference works best at 1.5k and up
- Trained on 320K segmented images from [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px)
- Base model is [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX) ( de-distilled FLUX )


# Showcases
<table style="width:100%; table-layout:fixed;">
  <tr>
    <td><img src="./examples/resized_kitten_seg.png" ></td>
    <td><img src="./examples/resized_kitten.png" ></td>
  </tr>
  <tr>
    <td><img src="./examples/resized_dread_girl_seg.png" ></td>
    <td><img src="./examples/resized_dread_girl.png" ></td>
  </tr>
  <tr>
    <td><img src="./examples/resized_house_seg.png" ></td>
    <td><img src="./examples/resized_house.png" ></td>
  </tr>
</table>


# Extra Details
- I built this repo to train the model: [https://github.com/NeuralVFX/LibreFLUX-ControlNet](https://github.com/NeuralVFX/LibreFLUX-ControlNet)
- Trained in same non-distilled fashion as [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX)
- Uses Attention Masking
- Uses CFG during Inference ( allows negative prompting )
- Inference code roughly adapted from: [https://github.com/bghira/SimpleTuner](https://github.com/bghira/SimpleTuner)

# ComfyUI
- I've made some custom nodes for this: [https://github.com/NeuralVFX/LibreFLUX-ComfyUI](https://github.com/NeuralVFX/LibreFLUX-ComfyUI)

# Compatibility
```py
pip install -U diffusers==0.32.0
pip install -U "transformers @ git+https://github.com/huggingface/transformers@e15687fffe5c9d20598a19aeab721ae0a7580f8a"
```
Low VRAM:
```py
pip install optimum-quanto
```
# Load Pipeline
```py
import torch
from diffusers import DiffusionPipeline

model_id = "neuralvfx/LibreFlux-ControlNet"  
device = "cuda" if torch.cuda.is_available() else "cpu"

dtype  = torch.bfloat16 if device == "cuda" else torch.float32

pipe = DiffusionPipeline.from_pretrained(
    model_id,
    custom_pipeline=model_id,
    trust_remote_code=True,   
    torch_dtype=dtype,
    safety_checker=None        
).to(device)
```

# Inference
```py
from PIL import Image
from torchvision.transforms import ToTensor

# Load Control Image
cond = Image.open("examples/libre_flux_control_image.png")
cond = cond.resize((1024, 1024))

# Convert PIL image to tensor and move to device with correct dtype
cond_tensor = ToTensor()(cond)[:3,:,:].to(pipe.device, dtype=pipe.dtype).unsqueeze(0)

out = pipe(
  prompt="many pieces of drift wood spelling libre flux sitting casting shadow on the lumpy sandy beach with foot prints all over it",
            negative_prompt="blurry",
            control_image=cond_tensor,  # Use the tensor here
            num_inference_steps=75,
            guidance_scale=4.0,
            height =1024,
            width=1024,
            controlnet_conditioning_scale=1.0,
            num_images_per_prompt=1,
            control_mode=None,
            generator= torch.Generator().manual_seed(32),
            return_dict=True,
        )
out.images[0]
```
# Load Pipeline ( Low VRAM )
```py
import torch
from diffusers import DiffusionPipeline
from optimum.quanto import freeze, quantize, qint8

model_id = "neuralvfx/LibreFlux-ControlNet" 
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype  = torch.bfloat16 if device == "cuda" else torch.float32

pipe = DiffusionPipeline.from_pretrained(
    model_id,
    custom_pipeline=model_id,
    trust_remote_code=True,    
    torch_dtype=dtype,
    safety_checker=None         
)

quantize(
    pipe.transformer,
    weights=qint8,
    exclude=[
        "*.norm", "*.norm1", "*.norm2", "*.norm2_context",
        "proj_out", "x_embedder", "norm_out", "context_embedder",
    ],
)

quantize(
    pipe.controlnet,
    weights=qint8,
    exclude=[
        "*.norm", "*.norm1", "*.norm2", "*.norm2_context",
        "proj_out", "x_embedder", "norm_out", "context_embedder",
    ],
)
freeze(pipe.transformer)
freeze(pipe.controlnet)

pipe.enable_model_cpu_offload()

```