Text-to-Image
InfiniteYou
ONNX
Diffusers
Safetensors
English
Text-to-Image
FLUX.1-dev
image-generation
Diffusion-Transformer
subject-personalization
Instructions to use ByteDance/InfiniteYou with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- InfiniteYou
How to use ByteDance/InfiniteYou with InfiniteYou:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
paper selected as ICCV 2025 (Highlight) 🎉
Browse files
README.md
CHANGED
|
@@ -36,7 +36,7 @@ This repository provides the official models for the following paper:
|
|
| 36 |
[Hao Kang](https://scholar.google.com/citations?user=VeTCSyEAAAAJ),
|
| 37 |
[Xin Lu](https://scholar.google.com/citations?user=mFC0wp8AAAAJ)<br />
|
| 38 |
ByteDance Intelligent Creation<br />
|
| 39 |
-
**ICCV 2025**
|
| 40 |
|
| 41 |
> **Abstract:** Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce **InfiniteYou (InfU)**, one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.
|
| 42 |
|
|
|
|
| 36 |
[Hao Kang](https://scholar.google.com/citations?user=VeTCSyEAAAAJ),
|
| 37 |
[Xin Lu](https://scholar.google.com/citations?user=mFC0wp8AAAAJ)<br />
|
| 38 |
ByteDance Intelligent Creation<br />
|
| 39 |
+
**ICCV 2025 (<span style="color:#F44336">Highlight</span>)**
|
| 40 |
|
| 41 |
> **Abstract:** Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce **InfiniteYou (InfU)**, one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.
|
| 42 |
|