gustproof
/

sd1-style

Model card Files Files and versions

gustproof commited on Mar 7, 2024

Commit

83fb0b0

·

verified ·

1 Parent(s): d7853bd

Update README.md

Files changed (1) hide show

README.md +31 -0

README.md CHANGED Viewed

@@ -1,3 +1,34 @@
 ---
 license: agpl-3.0
 ---

 ---
 license: agpl-3.0
 ---
+# SD1 Style Components (experimental)
+Style control for Stable Diffusion 1.x anime models
+## What is this?
+It is IP-Adapter, but for (anime) styles. Instead of CLIP image embeddings, the image generation is conditioned on 30-dimensional style embeddings, which can either be extracted from an image(s) or created manually.
+## Why?
+Currently, the main means of style control is through artist tags. This method reasonably raises the concern of style plagiarism.
+By breaking down styles into interpretable components that are present in all artists, direct copying of styles can be avoided.
+Furthermore, new styles can be easily created by manipulating the magnitude of the style components, offering more controllability over stacking artist tags or LoRAs.
+Additionally, this can be potentially useful for general purpose training, as training with style condition may weaken style leakage into concepts.
+This also serves as a demonstration that image models can be conditioned on arbitrary tensors other than text or images.
+Hopefully, more people can understand that it is not necessary to force conditions that are inherently numerical (aesthetic scores, dates, ...) into text form tags.
+## How do I use it?
+Currently, a [Colab notebook](https://colab.research.google.com/drive/1AKXiHHBAnzbtKyToN6WdzOov-niJudcL?usp=sharing) with a gradio interface is available.
+As this is only an experimental preview, proper support for popular web UIs will not be added before more the models reach a stable state.
+## Technical details
+First, a style embedding model is created by Supervised Contrastive Learning on an [artists dataset](https://huggingface.co/datasets/gustproof/artists/blob/main/artists.zip).
+Then, from the learned embeddings, the 30 first components of a PCA are extracted. Finally, a modified IP-Adapter is trained on anime-final-pruned using the same dataset with WD1.4 tags and the projected 30-d embeddings. The training resolution is 576*576 with variable aspect ratios.
+## Acknowledgements
+This is largely inspired by [Inserting Anybody in Diffusion Models via Celeb Basis](http://arxiv.org/abs/2306.00926) and [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter). Training and inference code is modified from IP-Adapter ([license](https://github.com/tencent-ailab/IP-Adapter/blob/main/LICENSE)).