Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,34 @@
|
|
| 1 |
---
|
| 2 |
license: agpl-3.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: agpl-3.0
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
# SD1 Style Components (experimental)
|
| 6 |
+
|
| 7 |
+
Style control for Stable Diffusion 1.x anime models
|
| 8 |
+
|
| 9 |
+
## What is this?
|
| 10 |
+
|
| 11 |
+
It is IP-Adapter, but for (anime) styles. Instead of CLIP image embeddings, the image generation is conditioned on 30-dimensional style embeddings, which can either be extracted from an image(s) or created manually.
|
| 12 |
+
|
| 13 |
+
## Why?
|
| 14 |
+
|
| 15 |
+
Currently, the main means of style control is through artist tags. This method reasonably raises the concern of style plagiarism.
|
| 16 |
+
By breaking down styles into interpretable components that are present in all artists, direct copying of styles can be avoided.
|
| 17 |
+
Furthermore, new styles can be easily created by manipulating the magnitude of the style components, offering more controllability over stacking artist tags or LoRAs.
|
| 18 |
+
|
| 19 |
+
Additionally, this can be potentially useful for general purpose training, as training with style condition may weaken style leakage into concepts.
|
| 20 |
+
This also serves as a demonstration that image models can be conditioned on arbitrary tensors other than text or images.
|
| 21 |
+
Hopefully, more people can understand that it is not necessary to force conditions that are inherently numerical (aesthetic scores, dates, ...) into text form tags.
|
| 22 |
+
|
| 23 |
+
## How do I use it?
|
| 24 |
+
|
| 25 |
+
Currently, a [Colab notebook](https://colab.research.google.com/drive/1AKXiHHBAnzbtKyToN6WdzOov-niJudcL?usp=sharing) with a gradio interface is available.
|
| 26 |
+
As this is only an experimental preview, proper support for popular web UIs will not be added before more the models reach a stable state.
|
| 27 |
+
|
| 28 |
+
## Technical details
|
| 29 |
+
First, a style embedding model is created by Supervised Contrastive Learning on an [artists dataset](https://huggingface.co/datasets/gustproof/artists/blob/main/artists.zip).
|
| 30 |
+
Then, from the learned embeddings, the 30 first components of a PCA are extracted. Finally, a modified IP-Adapter is trained on anime-final-pruned using the same dataset with WD1.4 tags and the projected 30-d embeddings. The training resolution is 576*576 with variable aspect ratios.
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
## Acknowledgements
|
| 34 |
+
This is largely inspired by [Inserting Anybody in Diffusion Models via Celeb Basis](http://arxiv.org/abs/2306.00926) and [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter). Training and inference code is modified from IP-Adapter ([license](https://github.com/tencent-ailab/IP-Adapter/blob/main/LICENSE)).
|