Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,53 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
---
|
| 5 |
+
license: apache-2.0
|
| 6 |
+
---
|
| 7 |
+
# AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
|
| 8 |
+
[Yanan Sun](https://scholar.google.com/citations?user=6TA1oPkAAAAJ&hl=en), Yanchen Liu, Yinhao Tang, [Wenjie Pei](https://wenjiepei.github.io/) and [Kai Chen*](https://chenkai.site/)
|
| 9 |
+
|
| 10 |
+
**Shanghai AI Laboratory**
|
| 11 |
+
|
| 12 |
+

|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
## Overview
|
| 16 |
+
The field of text-to-image (T2I) generation has made significant progress in recent years,
|
| 17 |
+
largely driven by advancements in diffusion models.
|
| 18 |
+
Linguistic control enables effective content creation, but struggles with fine-grained control over image generation.
|
| 19 |
+
This challenge has been explored, to a great extent, by incorporating additional usersupplied spatial conditions,
|
| 20 |
+
such as depth maps and edge maps, into pre-trained T2I models through extra encoding.
|
| 21 |
+
However, multi-control image synthesis still faces several challenges.
|
| 22 |
+
Specifically, current approaches are limited in handling free combinations of diverse input control signals,
|
| 23 |
+
overlook the complex relationships among multiple spatial conditions, and often fail to maintain semantic alignment with provided textual prompts.
|
| 24 |
+
This can lead to suboptimal user experiences. To address these challenges, we propose AnyControl,
|
| 25 |
+
a multi-control image synthesis framework that supports arbitrary combinations of diverse control signals.
|
| 26 |
+
AnyControl develops a novel Multi-Control Encoder that extracts a unified multi-modal embedding to guide the generation process.
|
| 27 |
+
This approach enables a holistic understanding of user inputs, and produces high-quality,
|
| 28 |
+
faithful results under versatile control signals, as demonstrated by extensive quantitative and qualitative evaluations.
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
## Model Card
|
| 32 |
+
AnyControl for SD 1.5
|
| 33 |
+
- `ckpts/anycontrol_15.ckpt`: weights for AnyControl.
|
| 34 |
+
- `ckpts/init_local.ckpt`: initial weights of AnyControl during training, generated following [Uni-ControlNet](https://github.com/ShihaoZhaoZSH/Uni-ControlNet).
|
| 35 |
+
- `ckpts/blip2_pretrained.pth`: third-party model.
|
| 36 |
+
- `annotator/ckpts`: third-party models used in annotators.
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
## License and Citation
|
| 40 |
+
|
| 41 |
+
All models and assets are under the [Apache 2.0 license](./LICENSE) unless specified otherwise.
|
| 42 |
+
|
| 43 |
+
If this work is helpful for your research, please consider citing the following BibTeX entry.
|
| 44 |
+
|
| 45 |
+
``` bibtex
|
| 46 |
+
@misc{sun2024anycontrol,
|
| 47 |
+
title={AnyControl: Create your artwork with versatile control on text-to-image generation},
|
| 48 |
+
author={Sun, Yanan and Liu, Yanchen and Tang, Yinhao and Pei, Wenjie and Chen, Kai},
|
| 49 |
+
booktitle={ECCV},
|
| 50 |
+
year={2024}
|
| 51 |
+
|
| 52 |
+
}
|
| 53 |
+
```
|