chl810903 wujunqiang commited on
Commit
1ca1f1a
·
0 Parent(s):

Duplicate from Kwai-Kolors/Kolors-IP-Adapter-Plus

Browse files

Co-authored-by: junqiang <wujunqiang@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ compare_demo.png filter=lfs diff=lfs merge=lfs -text
37
+ demo.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - Kolors
7
+ - text-to-image
8
+ - stable-diffusion
9
+ library_name: diffusers
10
+ ---
11
+
12
+
13
+ # Kolors-IP-Adapter-Plus weights and inference code
14
+
15
+ <div align="center" style="display: flex; justify-content: center; flex-wrap: wrap;">
16
+ <a href="https://github.com/Kwai-Kolors/Kolors"><img src="https://img.shields.io/static/v1?label=Kolors Code&message=Github&color=blue&logo=github-pages"></a> &ensp;
17
+ <a href="https://kwai-kolors.github.io/"><img src="https://img.shields.io/static/v1?label=Team%20Page&message=Page&color=green"></a> &ensp;
18
+ <a href="https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv:Kolors&color=red&logo=arxiv"></a> &ensp;
19
+ <a href="https://kolors.kuaishou.com/"><img src="https://img.shields.io/static/v1?label=Official Website&message=Page&color=green"></a>
20
+ </div>
21
+
22
+ ## <a name="Introduction"></a>📖 Introduction
23
+
24
+ We provide IP-Adapter-Plus weights and inference code based on [Kolors-Basemodel](https://huggingface.co/Kwai-Kolors/Kolors). Examples of Kolors-IP-Adapter-Plus results are as follows:
25
+ <img src="demo.png">
26
+
27
+
28
+ **Our improvements**
29
+
30
+ - A stronger image feature extractor. We employ the Openai-CLIP-336 model as the image encoder, which allows us to preserve more details in the reference images
31
+ - More diverse and high-quality training data: We construct a large-scale and high-quality training dataset inspired by the data strategies of other works. We believe that paired training data can effectively improve performance.
32
+
33
+
34
+ ## <a name="Evaluation"></a>📊 Evaluation
35
+ For evaluation, we create a test set consisting of over 200 reference images and text prompts. We invite several image experts to provide fair ratings for the generated results of different models. The experts rate the generated images based on four criteria: visual appeal, text faithfulness, image faithfulness, and overall satisfaction. Image faithfulness measures the semantic preservation ability of IP-Adapter on reference images, while the other criteria follow the evaluation standards of BaseModel. The specific results are summarized in the table below, where Kolors-IP-Adapter-Plus achieves the highest overall satisfaction score.
36
+
37
+ | Model | Average Overall Satisfaction | Average Image Faithfulness | Average Visual Appeal | Average Text Faithfulness |
38
+ | :--------------: | :--------: | :--------: | :--------: | :--------: |
39
+ | SDXL-IP-Adapter-Plus | 2.29 | 2.64 | 3.22 | 4.02 |
40
+ | Midjourney-v6-CW | 2.79 | 3.0 | 3.92 | 4.35 |
41
+ | **Kolors-IP-Adapter-Plus** | **3.04** | **3.25** | **4.45** | **4.30** |
42
+
43
+ <font color=gray style="font-size:12px">*The ip_scale parameter is set to 0.3 in SDXL-IP-Adapter-Plus, while Midjourney-v6-CW utilizes the default cw scale.*</font>
44
+
45
+
46
+ <img src="compare_demo.png">
47
+
48
+ <font color=gray style="font-size:12px">*Kolors-IP-Adapter-Plus employs chinese prompts, while other methods use english prompts.*</font>
49
+
50
+
51
+ ------
52
+
53
+ ## <a name="Usage"></a>🛠️ Usage
54
+
55
+ ### Requirements
56
+
57
+ The dependencies and installation are basically the same as the [Kolors-BaseModel](https://huggingface.co/Kwai-Kolors/Kolors).
58
+
59
+ 1. Repository Cloning and Dependency Installation
60
+
61
+ ```bash
62
+ apt-get install git-lfs
63
+ git clone https://github.com/Kwai-Kolors/Kolors
64
+ cd Kolors
65
+ conda create --name kolors python=3.8
66
+ conda activate kolors
67
+ pip install -r requirements.txt
68
+ python3 setup.py install
69
+ ```
70
+
71
+ 2. Weights download [link](https://huggingface.co/Kwai-Kolors/Kolors-IP-Adapter-Plus):
72
+ ```bash
73
+ huggingface-cli download --resume-download Kwai-Kolors/Kolors-IP-Adapter-Plus --local-dir weights/Kolors-IP-Adapter-Plus
74
+ ```
75
+ or
76
+ ```bash
77
+ git lfs clone https://huggingface.co/Kwai-Kolors/Kolors-IP-Adapter-Plus weights/Kolors-IP-Adapter-Plus
78
+ ```
79
+
80
+ 3. Inference:
81
+ ```bash
82
+ python ipadapter/sample_ipadapter_plus.py ./ipadapter/https://raw.githubusercontent.com/junqiangwu/Kolors/master/ipadapter/asset/test_ip.jpg "穿着黑色T恤衫,上面中文绿色大字写着“可图”"
83
+
84
+ python ipadapter/sample_ipadapter_plus.py ./ipadapter/https://raw.githubusercontent.com/junqiangwu/Kolors/master/ipadapter/asset/test_ip2.png "一只可爱的小狗在奔跑"
85
+
86
+ # The image will be saved to "scripts/outputs/"
87
+ ```
88
+
89
+
90
+ **Note**
91
+
92
+ The IP-Adapter-FaceID model based on Kolors will also be released soon!
93
+
94
+
95
+ ### Acknowledgments
96
+ - Thanks to [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter) for providing the codebase.
97
+ <br>
compare_demo.png ADDED

Git LFS Details

  • SHA256: dcb6f3c1f8a79acdb45fd48fe11a1b7220caa0a61c8d9b5727a25d4c988e7e0d
  • Pointer size: 132 Bytes
  • Size of remote file: 5.34 MB
config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {}
demo.png ADDED

Git LFS Details

  • SHA256: a9218c895b36394f69051522a656da322bc70621dfa9e70806dfe90cd06861b1
  • Pointer size: 132 Bytes
  • Size of remote file: 3.07 MB
image_encoder/config.json ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "openai/clip-vit-large-patch14-336",
3
+ "architectures": [
4
+ "CLIPModel"
5
+ ],
6
+ "initializer_factor": 1.0,
7
+ "logit_scale_init_value": 2.6592,
8
+ "model_type": "clip",
9
+ "projection_dim": 768,
10
+ "text_config": {
11
+ "_name_or_path": "",
12
+ "add_cross_attention": false,
13
+ "architectures": null,
14
+ "attention_dropout": 0.0,
15
+ "bad_words_ids": null,
16
+ "bos_token_id": 0,
17
+ "chunk_size_feed_forward": 0,
18
+ "cross_attention_hidden_size": null,
19
+ "decoder_start_token_id": null,
20
+ "diversity_penalty": 0.0,
21
+ "do_sample": false,
22
+ "dropout": 0.0,
23
+ "early_stopping": false,
24
+ "encoder_no_repeat_ngram_size": 0,
25
+ "eos_token_id": 2,
26
+ "exponential_decay_length_penalty": null,
27
+ "finetuning_task": null,
28
+ "forced_bos_token_id": null,
29
+ "forced_eos_token_id": null,
30
+ "hidden_act": "quick_gelu",
31
+ "hidden_size": 768,
32
+ "id2label": {
33
+ "0": "LABEL_0",
34
+ "1": "LABEL_1"
35
+ },
36
+ "initializer_factor": 1.0,
37
+ "initializer_range": 0.02,
38
+ "intermediate_size": 3072,
39
+ "is_decoder": false,
40
+ "is_encoder_decoder": false,
41
+ "label2id": {
42
+ "LABEL_0": 0,
43
+ "LABEL_1": 1
44
+ },
45
+ "layer_norm_eps": 1e-05,
46
+ "length_penalty": 1.0,
47
+ "max_length": 20,
48
+ "max_position_embeddings": 77,
49
+ "min_length": 0,
50
+ "model_type": "clip_text_model",
51
+ "no_repeat_ngram_size": 0,
52
+ "num_attention_heads": 12,
53
+ "num_beam_groups": 1,
54
+ "num_beams": 1,
55
+ "num_hidden_layers": 12,
56
+ "num_return_sequences": 1,
57
+ "output_attentions": false,
58
+ "output_hidden_states": false,
59
+ "output_scores": false,
60
+ "pad_token_id": 1,
61
+ "prefix": null,
62
+ "problem_type": null,
63
+ "projection_dim": 768,
64
+ "pruned_heads": {},
65
+ "remove_invalid_values": false,
66
+ "repetition_penalty": 1.0,
67
+ "return_dict": true,
68
+ "return_dict_in_generate": false,
69
+ "sep_token_id": null,
70
+ "task_specific_params": null,
71
+ "temperature": 1.0,
72
+ "tf_legacy_loss": false,
73
+ "tie_encoder_decoder": false,
74
+ "tie_word_embeddings": true,
75
+ "tokenizer_class": null,
76
+ "top_k": 50,
77
+ "top_p": 1.0,
78
+ "torch_dtype": null,
79
+ "torchscript": false,
80
+ "transformers_version": "4.21.3",
81
+ "typical_p": 1.0,
82
+ "use_bfloat16": false,
83
+ "vocab_size": 49408
84
+ },
85
+ "text_config_dict": {
86
+ "hidden_size": 768,
87
+ "intermediate_size": 3072,
88
+ "num_attention_heads": 12,
89
+ "num_hidden_layers": 12,
90
+ "projection_dim": 768
91
+ },
92
+ "torch_dtype": "float32",
93
+ "transformers_version": null,
94
+ "vision_config": {
95
+ "_name_or_path": "",
96
+ "add_cross_attention": false,
97
+ "architectures": null,
98
+ "attention_dropout": 0.0,
99
+ "bad_words_ids": null,
100
+ "bos_token_id": null,
101
+ "chunk_size_feed_forward": 0,
102
+ "cross_attention_hidden_size": null,
103
+ "decoder_start_token_id": null,
104
+ "diversity_penalty": 0.0,
105
+ "do_sample": false,
106
+ "dropout": 0.0,
107
+ "early_stopping": false,
108
+ "encoder_no_repeat_ngram_size": 0,
109
+ "eos_token_id": null,
110
+ "exponential_decay_length_penalty": null,
111
+ "finetuning_task": null,
112
+ "forced_bos_token_id": null,
113
+ "forced_eos_token_id": null,
114
+ "hidden_act": "quick_gelu",
115
+ "hidden_size": 1024,
116
+ "id2label": {
117
+ "0": "LABEL_0",
118
+ "1": "LABEL_1"
119
+ },
120
+ "image_size": 336,
121
+ "initializer_factor": 1.0,
122
+ "initializer_range": 0.02,
123
+ "intermediate_size": 4096,
124
+ "is_decoder": false,
125
+ "is_encoder_decoder": false,
126
+ "label2id": {
127
+ "LABEL_0": 0,
128
+ "LABEL_1": 1
129
+ },
130
+ "layer_norm_eps": 1e-05,
131
+ "length_penalty": 1.0,
132
+ "max_length": 20,
133
+ "min_length": 0,
134
+ "model_type": "clip_vision_model",
135
+ "no_repeat_ngram_size": 0,
136
+ "num_attention_heads": 16,
137
+ "num_beam_groups": 1,
138
+ "num_beams": 1,
139
+ "num_channels": 3,
140
+ "num_hidden_layers": 24,
141
+ "num_return_sequences": 1,
142
+ "output_attentions": false,
143
+ "output_hidden_states": false,
144
+ "output_scores": false,
145
+ "pad_token_id": null,
146
+ "patch_size": 14,
147
+ "prefix": null,
148
+ "problem_type": null,
149
+ "projection_dim": 768,
150
+ "pruned_heads": {},
151
+ "remove_invalid_values": false,
152
+ "repetition_penalty": 1.0,
153
+ "return_dict": true,
154
+ "return_dict_in_generate": false,
155
+ "sep_token_id": null,
156
+ "task_specific_params": null,
157
+ "temperature": 1.0,
158
+ "tf_legacy_loss": false,
159
+ "tie_encoder_decoder": false,
160
+ "tie_word_embeddings": true,
161
+ "tokenizer_class": null,
162
+ "top_k": 50,
163
+ "top_p": 1.0,
164
+ "torch_dtype": null,
165
+ "torchscript": false,
166
+ "transformers_version": "4.21.3",
167
+ "typical_p": 1.0,
168
+ "use_bfloat16": false
169
+ },
170
+ "vision_config_dict": {
171
+ "hidden_size": 1024,
172
+ "image_size": 336,
173
+ "intermediate_size": 4096,
174
+ "num_attention_heads": 16,
175
+ "num_hidden_layers": 24,
176
+ "patch_size": 14,
177
+ "projection_dim": 768
178
+ }
179
+ }
image_encoder/preprocessor_config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": 336,
3
+ "do_center_crop": true,
4
+ "do_normalize": true,
5
+ "do_resize": true,
6
+ "feature_extractor_type": "CLIPFeatureExtractor",
7
+ "image_mean": [
8
+ 0.48145466,
9
+ 0.4578275,
10
+ 0.40821073
11
+ ],
12
+ "image_std": [
13
+ 0.26862954,
14
+ 0.26130258,
15
+ 0.27577711
16
+ ],
17
+ "resample": 3,
18
+ "size": 336
19
+ }
image_encoder/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c6032c2e0caae3dc2d4fba35535fa6307dbb49df59c7e182b1bc4b3329b81801
3
+ size 1711974081
image_encoder/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
image_encoder/tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "<|startoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": "<|endoftext|>", "add_prefix_space": false, "errors": "replace", "do_lower_case": true, "name_or_path": "openai/clip-vit-base-patch32", "model_max_length": 77, "special_tokens_map_file": "/home/suraj/.cache/huggingface/transformers/18a566598f286c9139f88160c99f84eec492a26bd22738fa9cb44d5b7e0a5c76.cce1206abbad28826f000510f22f354e53e66a97f7c23745a7dfe27609cc07f5", "tokenizer_class": "CLIPTokenizer"}
image_encoder/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
ip_adapter_plus_general.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1983c3f927acb2a2f4285e1e134f302fab8e6d744a3b058e46848aba9957a60f
3
+ size 1013163359
model_index.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {}