| --- |
| license: apache-2.0 |
| library_name: diffusers |
| pipeline_tag: text-to-image |
| datasets: |
| - opendiffusionai/laion2b-squareish-1536px |
| thumbnail: https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/resolve/main/examples/side_by_side_b.png |
| base_model: jimmycarter/LibreFLUX |
| --- |
| # LibreFLUX-ControlNet |
|  |
|
|
| # Update - 4/10/2026 |
| - Retrained this model on [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px) |
| - I tripled the control layers, to get better guidance |
|
|
| # Fun Facts |
| - Trained exclusively on images generated by [Segment Anything (SAM)](https://aidemos.meta.com/segment-anything/) |
| - Uses SAM style images as input, outputs photorealistic images |
| - Trained at 1024x1024 resolution, inference works best at 1.5k and up |
| - Trained on 320K segmented images from [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px) |
| - Base model is [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX) ( de-distilled FLUX ) |
|
|
|
|
| # Showcases |
| <table style="width:100%; table-layout:fixed;"> |
| <tr> |
| <td><img src="./examples/resized_kitten_seg.png" ></td> |
| <td><img src="./examples/resized_kitten.png" ></td> |
| </tr> |
| <tr> |
| <td><img src="./examples/resized_dread_girl_seg.png" ></td> |
| <td><img src="./examples/resized_dread_girl.png" ></td> |
| </tr> |
| <tr> |
| <td><img src="./examples/resized_house_seg.png" ></td> |
| <td><img src="./examples/resized_house.png" ></td> |
| </tr> |
| </table> |
| |
|
|
| # Extra Details |
| - I built this repo to train the model: [https://github.com/NeuralVFX/LibreFLUX-ControlNet](https://github.com/NeuralVFX/LibreFLUX-ControlNet) |
| - Trained in same non-distilled fashion as [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX) |
| - Uses Attention Masking |
| - Uses CFG during Inference ( allows negative prompting ) |
| - Inference code roughly adapted from: [https://github.com/bghira/SimpleTuner](https://github.com/bghira/SimpleTuner) |
|
|
| # ComfyUI |
| - I've made some custom nodes for this: [https://github.com/NeuralVFX/LibreFLUX-ComfyUI](https://github.com/NeuralVFX/LibreFLUX-ComfyUI) |
|
|
| # Compatibility |
| ```py |
| pip install -U diffusers==0.32.0 |
| pip install -U "transformers @ git+https://github.com/huggingface/transformers@e15687fffe5c9d20598a19aeab721ae0a7580f8a" |
| ``` |
| Low VRAM: |
| ```py |
| pip install optimum-quanto |
| ``` |
| # Load Pipeline |
| ```py |
| import torch |
| from diffusers import DiffusionPipeline |
| |
| model_id = "neuralvfx/LibreFlux-ControlNet" |
| device = "cuda" if torch.cuda.is_available() else "cpu" |
| |
| dtype = torch.bfloat16 if device == "cuda" else torch.float32 |
| |
| pipe = DiffusionPipeline.from_pretrained( |
| model_id, |
| custom_pipeline=model_id, |
| trust_remote_code=True, |
| torch_dtype=dtype, |
| safety_checker=None |
| ).to(device) |
| ``` |
|
|
| # Inference |
| ```py |
| from PIL import Image |
| from torchvision.transforms import ToTensor |
| |
| # Load Control Image |
| cond = Image.open("examples/libre_flux_control_image.png") |
| cond = cond.resize((1024, 1024)) |
| |
| # Convert PIL image to tensor and move to device with correct dtype |
| cond_tensor = ToTensor()(cond)[:3,:,:].to(pipe.device, dtype=pipe.dtype).unsqueeze(0) |
| |
| out = pipe( |
| prompt="many pieces of drift wood spelling libre flux sitting casting shadow on the lumpy sandy beach with foot prints all over it", |
| negative_prompt="blurry", |
| control_image=cond_tensor, # Use the tensor here |
| num_inference_steps=75, |
| guidance_scale=4.0, |
| height =1024, |
| width=1024, |
| controlnet_conditioning_scale=1.0, |
| num_images_per_prompt=1, |
| control_mode=None, |
| generator= torch.Generator().manual_seed(32), |
| return_dict=True, |
| ) |
| out.images[0] |
| ``` |
| # Load Pipeline ( Low VRAM ) |
| ```py |
| import torch |
| from diffusers import DiffusionPipeline |
| from optimum.quanto import freeze, quantize, qint8 |
| |
| model_id = "neuralvfx/LibreFlux-ControlNet" |
| device = "cuda" if torch.cuda.is_available() else "cpu" |
| dtype = torch.bfloat16 if device == "cuda" else torch.float32 |
| |
| pipe = DiffusionPipeline.from_pretrained( |
| model_id, |
| custom_pipeline=model_id, |
| trust_remote_code=True, |
| torch_dtype=dtype, |
| safety_checker=None |
| ) |
| |
| quantize( |
| pipe.transformer, |
| weights=qint8, |
| exclude=[ |
| "*.norm", "*.norm1", "*.norm2", "*.norm2_context", |
| "proj_out", "x_embedder", "norm_out", "context_embedder", |
| ], |
| ) |
| |
| quantize( |
| pipe.controlnet, |
| weights=qint8, |
| exclude=[ |
| "*.norm", "*.norm1", "*.norm2", "*.norm2_context", |
| "proj_out", "x_embedder", "norm_out", "context_embedder", |
| ], |
| ) |
| freeze(pipe.transformer) |
| freeze(pipe.controlnet) |
| |
| pipe.enable_model_cpu_offload() |
| |
| ``` |