focus_flux / README.md
ericbill21's picture
Update README.md
085c98f verified
---
license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/LICENSE.md
language:
- en
base_model:
- black-forest-labs/FLUX.1-dev
pipeline_tag: text-to-image
tags:
- flux.1-dev
- flux
- text-to-image
- multi-subject
- FOCUS
- flow-matching
- optimal-control
- fine-tuned
---
![FLUX.1 [dev] + FOCUS](./teasers.jpg)
# FLUX.1 [dev] fine-tuned for multi-subject prompts
**TL;DR**: A **fine-tuned derivative of `black-forest-labs/FLUX.1-dev`** focused on **multi-subject fidelity**—keeping multiple entities and their attributes unentangled while **preserving base style**. Works across animals, people, and objects.
Read the paper: **[Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity](https://arxiv.org/abs/2510.02315)**.
> ⚠️ Licensing: This model inherits the **BlackForest Community License** from the base model and is distributed under compatible terms. Use is subject to the base model’s license
---
## What’s improved
- **Entity disentanglement**: better separation across 2–4 subjects, fewer merges/omissions.
- **Attribute binding**: colors, clothing, and small accessories stick to the correct subject.
- **Single Subject**: also improve sinlge subject generation, while staying stylistic close to base model.
---
## Quick start (Diffusers)
Install the [🧨 diffusers library](https://github.com/huggingface/diffusers)
```
pip install -U transformers==4.53.0 diffusers==0.33.1
```
Then:
```python
import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained(
"ericbill21/focus_flux",
torch_dtype=torch.bfloat16
).to("cuda")
# For smaller GPUs use: pipe.enable_sequential_cpu_offload() instead of .to("cuda")
image = pipe(
prompt="A lion and a tiger resting side by side in a jungle clearing",
num_inference_steps=28,
guidance_scale=3.5,
max_sequence_length=256,
height=512,
width=512,
generator=torch.Generator("cpu").manual_seed(5),
).images[0]
image.save("sample.png")
```
Since this uses the standard Diffusers pipeline, you can apply features like xFormers attention, VAE tiling/slicing, and quantization as usual.
## How was this achieved?
We cast multi-subject fidelity as a stochastic optimal control problem over flow-matching samplers and fine-tune via FOCUS (an adjoint-matching heuristic). A lightweight controller is trained to respect subject identity, attributes, and spatial relations while staying close to the base distribution, yielding improved multi-subject fidelity without sacrificing style. Full details and ablations are in the paper and code.
- Paper: [https://arxiv.org/abs/2510.02315](https://arxiv.org/abs/2510.02315)
- Code: [https://github.com/ericbill21/FOCUS](https://github.com/ericbill21/FOCUS)
## Model details
- Base: `black-forest-labs/FLUX.1-dev`
- Type: full pipeline (no LoRA required at inference)
- Intended use: research/creative work where multi-subject consistency matters
- Limitations: under extreme clutter or highly similar subjects, attributes may still leak; biases of the base model may persist.
# Citation
If you find this useful, please cite:
```
@article{Bill2025FOCUS,
title = {Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity},
author = {Eric Tillmann Bill and Enis Simsar and Thomas Hofmann},
journal = {arXiv preprint arXiv:2510.02315},
year = {2025},
url = {https://arxiv.org/abs/2510.02315}
}
```
## Contact
Feedback and issues welcome via the Hugging Face model page or GitHub.