focus_sd35 / README.md

Update README.md

6f3d242 verified 4 months ago

3.64 kB

	---
	license: other
	license_name: stabilityai-ai-community
	license_link: >-
	https://huggingface.co/stabilityai/stable-diffusion-3.5-large/resolve/main/LICENSE.md
	language:
	- en
	base_model:
	- stabilityai/stable-diffusion-3.5-medium
	pipeline_tag: text-to-image
	tags:
	- stable-diffusion-3.5
	- sd3.5
	- text-to-image
	- multi-subject
	- FOCUS
	- flow-matching
	- optimal-control
	- fine-tuned
	---

	![SD3.5 + FOCUS](./teasers.jpg)

	# SD3.5 fine-tuned for multi-subject prompts

	TL;DR: A fine-tuned derivative of `stabilityai/stable-diffusion-3.5-medium` focused on multi-subject fidelity—keeping multiple entities and their attributes unentangled while preserving base style. Works across animals, people, and objects.
	Read the paper: [Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity](https://arxiv.org/abs/2510.02315).

	> ⚠️ Licensing: This model inherits the StabilityAI Community License from the base model and is distributed under compatible terms. Use is subject to the base model’s license

	---

	## What’s improved

	- Entity disentanglement: better separation across 2–4 subjects, fewer merges/omissions.
	- Attribute binding: colors, clothing, and small accessories stick to the correct subject.
	- Single Subject: also improve sinlge subject generation, while staying stylistic close to base model.

	---

	## Quick start (Diffusers)

	Install the [🧨 diffusers library](https://github.com/huggingface/diffusers)
	```
	pip install -U transformers==4.53.0 diffusers==0.33.1
	```

	Then:
	```python
	import torch
	from diffusers import StableDiffusion3Pipeline

	pipe = StableDiffusion3Pipeline.from_pretrained(
	"ericbill21/focus_sd35",
	torch_dtype=torch.float16
	).to("cuda")
	# For smaller GPUs use: pipe.enable_sequential_cpu_offload()

	image = pipe(
	prompt="A horse and a bear in a forest",
	num_inference_steps=28,
	guidance_scale=4.5,
	max_sequence_length=77,
	height=512,
	width=512,
	generator=torch.Generator("cpu").manual_seed(1),
	).images[0]

	image.save("sample.png")
	```

	Since this uses the standard Diffusers pipeline, you can apply features like xFormers attention, VAE tiling/slicing, and quantization as usual.

	## How was this achieved?
	We cast multi-subject fidelity as a stochastic optimal control problem over flow-matching samplers and fine-tune via FOCUS (an adjoint-matching heuristic). A lightweight controller is trained to respect subject identity, attributes, and spatial relations while staying close to the base distribution, yielding improved multi-subject fidelity without sacrificing style. Full details and ablations are in the paper and code.
	- Paper: [https://arxiv.org/abs/2510.02315](https://arxiv.org/abs/2510.02315)
	- Code: [https://github.com/ericbill21/FOCUS](https://github.com/ericbill21/FOCUS)

	## Model details
	- Base: `stabilityai/stable-diffusion-3.5-medium`
	- Type: full pipeline (no LoRA required at inference)
	- Intended use: research/creative work where multi-subject consistency matters
	- Limitations: under extreme clutter or highly similar subjects, attributes may still leak; biases of the base model may persist.


	# Citation
	If you find this useful, please cite:
	```
	@article{Bill2025FOCUS,
	title = {Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity},
	author = {Eric Tillmann Bill and Enis Simsar and Thomas Hofmann},
	journal = {arXiv preprint arXiv:2510.02315},
	year = {2025},
	url = {https://arxiv.org/abs/2510.02315}
	}
	```

	## Contact
	Feedback and issues welcome via the Hugging Face model page or GitHub.