|
|
--- |
|
|
license: other |
|
|
license_name: stabilityai-ai-community |
|
|
license_link: >- |
|
|
https://huggingface.co/stabilityai/stable-diffusion-3.5-large/resolve/main/LICENSE.md |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- stabilityai/stable-diffusion-3.5-medium |
|
|
pipeline_tag: text-to-image |
|
|
tags: |
|
|
- stable-diffusion-3.5 |
|
|
- sd3.5 |
|
|
- text-to-image |
|
|
- multi-subject |
|
|
- FOCUS |
|
|
- flow-matching |
|
|
- optimal-control |
|
|
- fine-tuned |
|
|
--- |
|
|
|
|
|
 |
|
|
|
|
|
# SD3.5 fine-tuned for multi-subject prompts |
|
|
|
|
|
**TL;DR**: A **fine-tuned derivative of `stabilityai/stable-diffusion-3.5-medium`** focused on **multi-subject fidelity**—keeping multiple entities and their attributes unentangled while **preserving base style**. Works across animals, people, and objects. |
|
|
Read the paper: **[Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity](https://arxiv.org/abs/2510.02315)**. |
|
|
|
|
|
> ⚠️ Licensing: This model inherits the **StabilityAI Community License** from the base model and is distributed under compatible terms. Use is subject to the base model’s license |
|
|
|
|
|
--- |
|
|
|
|
|
## What’s improved |
|
|
|
|
|
- **Entity disentanglement**: better separation across 2–4 subjects, fewer merges/omissions. |
|
|
- **Attribute binding**: colors, clothing, and small accessories stick to the correct subject. |
|
|
- **Single Subject**: also improve sinlge subject generation, while staying stylistic close to base model. |
|
|
|
|
|
--- |
|
|
|
|
|
## Quick start (Diffusers) |
|
|
|
|
|
Install the [🧨 diffusers library](https://github.com/huggingface/diffusers) |
|
|
``` |
|
|
pip install -U transformers==4.53.0 diffusers==0.33.1 |
|
|
``` |
|
|
|
|
|
Then: |
|
|
```python |
|
|
import torch |
|
|
from diffusers import StableDiffusion3Pipeline |
|
|
|
|
|
pipe = StableDiffusion3Pipeline.from_pretrained( |
|
|
"ericbill21/focus_sd35", |
|
|
torch_dtype=torch.float16 |
|
|
).to("cuda") |
|
|
# For smaller GPUs use: pipe.enable_sequential_cpu_offload() |
|
|
|
|
|
image = pipe( |
|
|
prompt="A horse and a bear in a forest", |
|
|
num_inference_steps=28, |
|
|
guidance_scale=4.5, |
|
|
max_sequence_length=77, |
|
|
height=512, |
|
|
width=512, |
|
|
generator=torch.Generator("cpu").manual_seed(1), |
|
|
).images[0] |
|
|
|
|
|
image.save("sample.png") |
|
|
``` |
|
|
|
|
|
Since this uses the standard Diffusers pipeline, you can apply features like xFormers attention, VAE tiling/slicing, and quantization as usual. |
|
|
|
|
|
## How was this achieved? |
|
|
We cast multi-subject fidelity as a stochastic optimal control problem over flow-matching samplers and fine-tune via FOCUS (an adjoint-matching heuristic). A lightweight controller is trained to respect subject identity, attributes, and spatial relations while staying close to the base distribution, yielding improved multi-subject fidelity without sacrificing style. Full details and ablations are in the paper and code. |
|
|
- Paper: [https://arxiv.org/abs/2510.02315](https://arxiv.org/abs/2510.02315) |
|
|
- Code: [https://github.com/ericbill21/FOCUS](https://github.com/ericbill21/FOCUS) |
|
|
|
|
|
## Model details |
|
|
- Base: `stabilityai/stable-diffusion-3.5-medium` |
|
|
- Type: full pipeline (no LoRA required at inference) |
|
|
- Intended use: research/creative work where multi-subject consistency matters |
|
|
- Limitations: under extreme clutter or highly similar subjects, attributes may still leak; biases of the base model may persist. |
|
|
|
|
|
|
|
|
# Citation |
|
|
If you find this useful, please cite: |
|
|
``` |
|
|
@article{Bill2025FOCUS, |
|
|
title = {Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity}, |
|
|
author = {Eric Tillmann Bill and Enis Simsar and Thomas Hofmann}, |
|
|
journal = {arXiv preprint arXiv:2510.02315}, |
|
|
year = {2025}, |
|
|
url = {https://arxiv.org/abs/2510.02315} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Contact |
|
|
Feedback and issues welcome via the Hugging Face model page or GitHub. |