File size: 4,724 Bytes

4040b08
 
 
 
 
48f5914
ad45577
71ed17f
4040b08
a1d5a68
b1c8839
63fc4f0
f8ae33c
 
 
 
6a25168
ae44ce7
6a25168
 
 
3951917
ae44ce7
db065f1
9d6feb8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01638a4
ae44ce7
 
3951917
db065f1
ae44ce7
 
 
c302c36
18f9bf2
71ed17f
eb6acfe
16f4f68
46378ad
 
 
0ffc2f0
 
 
 
eb6acfe
16f4f68
 
46378ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb6acfe
16f4f68
 
70422b4
 
46378ad
85f9d3f
46378ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ffc2f0
 
 
 
 
 
0c97cea
0ffc2f0
 
 
 
 
 
0c97cea
0ffc2f0
0c97cea
0ffc2f0

---
license: apache-2.0
library_name: diffusers
pipeline_tag: text-to-image
datasets:
- opendiffusionai/laion2b-squareish-1536px
thumbnail: https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/resolve/main/examples/side_by_side_b.png
base_model: jimmycarter/LibreFLUX
---
# LibreFLUX-ControlNet
![Example: Control image vs result](examples/side_by_side_b.png)

# Update - 4/10/2026
- Retrained this model on [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px)
- I tripled the control layers, to get better guidance

# Fun Facts
- Trained exclusively on images generated by [Segment Anything (SAM)](https://aidemos.meta.com/segment-anything/)
- Uses SAM style images as input, outputs photorealistic images
- Trained at 1024x1024 resolution, inference works best at 1.5k and up
- Trained on 320K segmented images from [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px)
- Base model is [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX) ( de-distilled FLUX )


# Showcases
<table style="width:100%; table-layout:fixed;">
  <tr>
    <td><img src="./examples/resized_kitten_seg.png" ></td>
    <td><img src="./examples/resized_kitten.png" ></td>
  </tr>
  <tr>
    <td><img src="./examples/resized_dread_girl_seg.png" ></td>
    <td><img src="./examples/resized_dread_girl.png" ></td>
  </tr>
  <tr>
    <td><img src="./examples/resized_house_seg.png" ></td>
    <td><img src="./examples/resized_house.png" ></td>
  </tr>
</table>


# Extra Details
- I built this repo to train the model: [https://github.com/NeuralVFX/LibreFLUX-ControlNet](https://github.com/NeuralVFX/LibreFLUX-ControlNet)
- Trained in same non-distilled fashion as [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX)
- Uses Attention Masking
- Uses CFG during Inference ( allows negative prompting )
- Inference code roughly adapted from: [https://github.com/bghira/SimpleTuner](https://github.com/bghira/SimpleTuner)

# ComfyUI
- I've made some custom nodes for this: [https://github.com/NeuralVFX/LibreFLUX-ComfyUI](https://github.com/NeuralVFX/LibreFLUX-ComfyUI)

# Compatibility
```py
pip install -U diffusers==0.32.0
pip install -U "transformers @ git+https://github.com/huggingface/transformers@e15687fffe5c9d20598a19aeab721ae0a7580f8a"
```
Low VRAM:
```py
pip install optimum-quanto
```
# Load Pipeline
```py
import torch
from diffusers import DiffusionPipeline

model_id = "neuralvfx/LibreFlux-ControlNet"  
device = "cuda" if torch.cuda.is_available() else "cpu"

dtype  = torch.bfloat16 if device == "cuda" else torch.float32

pipe = DiffusionPipeline.from_pretrained(
    model_id,
    custom_pipeline=model_id,
    trust_remote_code=True,   
    torch_dtype=dtype,
    safety_checker=None        
).to(device)
```

# Inference
```py
from PIL import Image
from torchvision.transforms import ToTensor

# Load Control Image
cond = Image.open("examples/libre_flux_control_image.png")
cond = cond.resize((1024, 1024))

# Convert PIL image to tensor and move to device with correct dtype
cond_tensor = ToTensor()(cond)[:3,:,:].to(pipe.device, dtype=pipe.dtype).unsqueeze(0)

out = pipe(
  prompt="many pieces of drift wood spelling libre flux sitting casting shadow on the lumpy sandy beach with foot prints all over it",
            negative_prompt="blurry",
            control_image=cond_tensor,  # Use the tensor here
            num_inference_steps=75,
            guidance_scale=4.0,
            height =1024,
            width=1024,
            controlnet_conditioning_scale=1.0,
            num_images_per_prompt=1,
            control_mode=None,
            generator= torch.Generator().manual_seed(32),
            return_dict=True,
        )
out.images[0]
```
# Load Pipeline ( Low VRAM )
```py
import torch
from diffusers import DiffusionPipeline
from optimum.quanto import freeze, quantize, qint8

model_id = "neuralvfx/LibreFlux-ControlNet" 
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype  = torch.bfloat16 if device == "cuda" else torch.float32

pipe = DiffusionPipeline.from_pretrained(
    model_id,
    custom_pipeline=model_id,
    trust_remote_code=True,    
    torch_dtype=dtype,
    safety_checker=None         
)

quantize(
    pipe.transformer,
    weights=qint8,
    exclude=[
        "*.norm", "*.norm1", "*.norm2", "*.norm2_context",
        "proj_out", "x_embedder", "norm_out", "context_embedder",
    ],
)

quantize(
    pipe.controlnet,
    weights=qint8,
    exclude=[
        "*.norm", "*.norm1", "*.norm2", "*.norm2_context",
        "proj_out", "x_embedder", "norm_out", "context_embedder",
    ],
)
freeze(pipe.transformer)
freeze(pipe.controlnet)

pipe.enable_model_cpu_offload()

```