Update README.md

ad45577 verified about 1 month ago

4.72 kB

	---
	license: apache-2.0
	library_name: diffusers
	pipeline_tag: text-to-image
	datasets:
	- opendiffusionai/laion2b-squareish-1536px
	thumbnail: https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/resolve/main/examples/side_by_side_b.png
	base_model: jimmycarter/LibreFLUX
	---
	# LibreFLUX-ControlNet
	![Example: Control image vs result](examples/side_by_side_b.png)

	# Update - 4/10/2026
	- Retrained this model on [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px)
	- I tripled the control layers, to get better guidance

	# Fun Facts
	- Trained exclusively on images generated by [Segment Anything (SAM)](https://aidemos.meta.com/segment-anything/)
	- Uses SAM style images as input, outputs photorealistic images
	- Trained at 1024x1024 resolution, inference works best at 1.5k and up
	- Trained on 320K segmented images from [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px)
	- Base model is [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX) ( de-distilled FLUX )


	# Showcases
	<table style="width:100%; table-layout:fixed;">
	<tr>
	<td><img src="./examples/resized_kitten_seg.png" ></td>
	<td><img src="./examples/resized_kitten.png" ></td>
	</tr>
	<tr>
	<td><img src="./examples/resized_dread_girl_seg.png" ></td>
	<td><img src="./examples/resized_dread_girl.png" ></td>
	</tr>
	<tr>
	<td><img src="./examples/resized_house_seg.png" ></td>
	<td><img src="./examples/resized_house.png" ></td>
	</tr>
	</table>


	# Extra Details
	- I built this repo to train the model: [https://github.com/NeuralVFX/LibreFLUX-ControlNet](https://github.com/NeuralVFX/LibreFLUX-ControlNet)
	- Trained in same non-distilled fashion as [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX)
	- Uses Attention Masking
	- Uses CFG during Inference ( allows negative prompting )
	- Inference code roughly adapted from: [https://github.com/bghira/SimpleTuner](https://github.com/bghira/SimpleTuner)

	# ComfyUI
	- I've made some custom nodes for this: [https://github.com/NeuralVFX/LibreFLUX-ComfyUI](https://github.com/NeuralVFX/LibreFLUX-ComfyUI)

	# Compatibility
	```py
	pip install -U diffusers==0.32.0
	pip install -U "transformers @ git+https://github.com/huggingface/transformers@e15687fffe5c9d20598a19aeab721ae0a7580f8a"
	```
	Low VRAM:
	```py
	pip install optimum-quanto
	```
	# Load Pipeline
	```py
	import torch
	from diffusers import DiffusionPipeline

	model_id = "neuralvfx/LibreFlux-ControlNet"
	device = "cuda" if torch.cuda.is_available() else "cpu"

	dtype = torch.bfloat16 if device == "cuda" else torch.float32

	pipe = DiffusionPipeline.from_pretrained(
	model_id,
	custom_pipeline=model_id,
	trust_remote_code=True,
	torch_dtype=dtype,
	safety_checker=None
	).to(device)
	```

	# Inference
	```py
	from PIL import Image
	from torchvision.transforms import ToTensor

	# Load Control Image
	cond = Image.open("examples/libre_flux_control_image.png")
	cond = cond.resize((1024, 1024))

	# Convert PIL image to tensor and move to device with correct dtype
	cond_tensor = ToTensor()(cond)[:3,:,:].to(pipe.device, dtype=pipe.dtype).unsqueeze(0)

	out = pipe(
	prompt="many pieces of drift wood spelling libre flux sitting casting shadow on the lumpy sandy beach with foot prints all over it",
	negative_prompt="blurry",
	control_image=cond_tensor, # Use the tensor here
	num_inference_steps=75,
	guidance_scale=4.0,
	height =1024,
	width=1024,
	controlnet_conditioning_scale=1.0,
	num_images_per_prompt=1,
	control_mode=None,
	generator= torch.Generator().manual_seed(32),
	return_dict=True,
	)
	out.images[0]
	```
	# Load Pipeline ( Low VRAM )
	```py
	import torch
	from diffusers import DiffusionPipeline
	from optimum.quanto import freeze, quantize, qint8

	model_id = "neuralvfx/LibreFlux-ControlNet"
	device = "cuda" if torch.cuda.is_available() else "cpu"
	dtype = torch.bfloat16 if device == "cuda" else torch.float32

	pipe = DiffusionPipeline.from_pretrained(
	model_id,
	custom_pipeline=model_id,
	trust_remote_code=True,
	torch_dtype=dtype,
	safety_checker=None
	)

	quantize(
	pipe.transformer,
	weights=qint8,
	exclude=[
	".norm", ".norm1", ".norm2", ".norm2_context",
	"proj_out", "x_embedder", "norm_out", "context_embedder",
	],
	)

	quantize(
	pipe.controlnet,
	weights=qint8,
	exclude=[
	".norm", ".norm1", ".norm2", ".norm2_context",
	"proj_out", "x_embedder", "norm_out", "context_embedder",
	],
	)
	freeze(pipe.transformer)
	freeze(pipe.controlnet)

	pipe.enable_model_cpu_offload()

	```