PrometheusProject xinsir commited on
Commit
4f353d7
·
0 Parent(s):

Duplicate from xinsir/anime-painter

Browse files

Co-authored-by: qi <xinsir@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ masonry_anime.webp filter=lfs diff=lfs merge=lfs -text
000000_scribble_concat.webp ADDED
000010_scribble_concat.webp ADDED
000013_scribble_concat.webp ADDED
000015_scribble_concat.webp ADDED
000035_scribble_concat.webp ADDED
000043_scribble_concat.webp ADDED
000067_scribble_concat.webp ADDED
000092_scribble_concat.webp ADDED
README.md ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - diffusers
5
+ - controlnet
6
+ - text_to_image
7
+ - controlnet-scribble-sdxl-1.0
8
+ pipeline_tag: text-to-image
9
+ ---
10
+
11
+ # ***Make everyone an anime painter, even you don't know anything about drawing.***
12
+
13
+ ![An image of a sunset](./masonry_anime.webp)
14
+
15
+ <!-- Provide a quick summary of what the model is/does. -->
16
+
17
+ # Controlnet-scribble-sdxl-1.0-anime
18
+ This is a controlnet-scribble-sdxl-1.0 model that can generate very high quality images with an anime sketch, it can support any type of and any width of lines. As you can see from the examples that the sketch
19
+ can be very simple and unclear, we suppose you are just a child or a person know nothing about drawing, you can simple doodle and write some danbooru tags to generate a beautiful anime Illustration. In our evalution,
20
+ the model achieves state of the art performance, obviously better than the original SDXL1.5 Scribble trained by lvming Zhang[https://github.com/lllyasviel/ControlNet], the model have been trained with complex tricks
21
+ and high quality dataset, besides the aesthetic score, the prompt following ability[propose by Openai in the paper(https://cdn.openai.com/papers/dall-e-3.pdf)] and the image deformity rate[the probability that the images generate abnormal human struction]
22
+ also improves a lot. The founder of Midjourney said that: midjourney can help those who don't know drawing to draw, so it expands the boundaries of their imagination. We have the similar vision that: we hope to let those
23
+ person who don't know anime or cartoons to create their own characters in a simple way, to express yourself and unleash your creativity. AIGC will reshape the animation industry, **the model we released can generate anime images with
24
+ aesthetic score higher than almost all popular anime websites in average, so just enjoy it**. If you want to generate especially visually appealing images, you should use danbooru tags along with natural language, due to the reason that the anime images
25
+ is far less than the real images, you can't just use natural language input like "a girl walk in the street" as the information is limited. Instead you should describe it with more detail such as "a girl, blue shirt, white hair, black eye, smile, pink flower, cherry blossoms ..."
26
+ In summary, you should first use tags to describle what in the image[danbooru tag] and then describe what happened in the image[natural language], the detail the better. If you don't describe it very clean, the image generated will be something totally by probability,
27
+ anyway, it will suit the condition image you draw and the edge detection will coincide between the condition and the generated image, the model can understand your drawing from semantics to some degree, and give you a result that is not bad. To the best of our knowledge,
28
+ we haven't see other SDXL-Scribble model in the opensource community, probably we are the first.
29
+ ### Attention
30
+ To generate anime images with our model, you need to choose an
31
+ anime sdxl base model from huggingface[https://huggingface.co/models?pipeline_tag=text-to-image&sort=trending&search=blue] or civitai[https://civitai.com/search/models?baseModel=SDXL%201.0&sortBy=models_v8&query=anime].
32
+ The showcases we list here is based on CounterfeitXL[https://huggingface.co/gsdf/CounterfeitXL/tree/main], different base model have different image styles and you can use bluepencil or other model as well. The model was trained with large amount of anime images which includes
33
+ almost all the anime images we can found in the Internet. We filtered it seriously to preserve the images that have high visual quality, comparable to nijijourney or popular anime Illustration. We trained it with controlnet-sdxl-1.0,
34
+ [https://arxiv.org/abs/2302.05543], the technical detail won't not be disclosed in this report.
35
+
36
+
37
+ ### Model Description
38
+
39
+ <!-- Provide a longer summary of what this model is. -->
40
+
41
+ - **Developed by:** xinsir
42
+ - **Model type:** ControlNet_SDXL
43
+ - **License:** apache-2.0
44
+ - **Finetuned from model [optional]:** stabilityai/stable-diffusion-xl-base-1.0
45
+
46
+ ### Model Sources [optional]
47
+
48
+ <!-- Provide the basic links for the model. -->
49
+
50
+ - **Paper [optional]:** https://arxiv.org/abs/2302.05543
51
+ -
52
+
53
+ ## Examples Display
54
+ prompt: 1girl, breasts, solo, long hair, pointy ears, red eyes, horns, navel, sitting, cleavage, toeless legwear, hair ornament, smoking pipe, oni horns, thighhighs, detached sleeves, looking at viewer, smile, large breasts, holding smoking pipe, wide sleeves, bare shoulders, flower, barefoot, holding, nail polish, black thighhighs, jewelry, hair flower, oni, japanese clothes, fire, kiseru, very long hair, ponytail, black hair, long sleeves, bangs, red nails, closed mouth, toenails, navel cutout, cherry blossoms, water, red dress, fingernails
55
+ ![image0](./000013_scribble_concat.webp)
56
+
57
+ prompt: 1girl, solo, blonde hair, weapon, sword, hair ornament, hair flower, flower, dress, holding weapon, holding sword, holding, gloves, breasts, full body, black dress, thighhighs, looking at viewer, boots, bare shoulders, bangs, medium breasts, standing, black gloves, short hair with long locks, thigh boots, sleeveless dress, elbow gloves, sidelocks, black background, black footwear, yellow eyes, sleeveless
58
+ ![image1](./000015_scribble_concat.webp)
59
+
60
+ prompt: 1girl, solo, holding, white gloves, smile, purple eyes, gloves, closed mouth, balloon, holding microphone, microphone, blue flower, long hair, puffy sleeves, purple flower, blush, puffy short sleeves, short sleeves, bangs, dress, shoes, very long hair, standing, pleated dress, white background, flower, full body, blue footwear, one side up, arm up, hair bun, brown hair, food, mini crown, crown, looking at viewer, hair between eyes, heart balloon, heart, tilted headwear, single side bun, hand up
61
+ ![image2](./000010_scribble_concat.webp)
62
+
63
+ prompt: tiger, 1boy, male focus, blue eyes, braid, animal ears, tiger ears, 2022, solo, smile, chinese zodiac, year of the tiger, looking at viewer, hair over one eye, weapon, holding, white tiger, grin, grey hair, polearm, arm up, white hair, animal, holding weapon, arm behind head, multicolored hair, holding polearm
64
+ ![image3](./000000_scribble_concat.webp)
65
+
66
+ prompt: 1boy, male child, glasses, male focus, shorts, solo, closed eyes, bow, bowtie, smile, open mouth, red bow, jacket, red bowtie, white background, shirt, happy, black shorts, child, simple background, long sleeves, ^_^, short hair, white shirt, brown hair, black-framed eyewear, :d, facing viewer, black hair
67
+ ![image4](./000035_scribble_concat.webp)
68
+
69
+ prompt: solo, 1girl, swimsuit, blue eyes, plaid headwear, bikini, blue hair, virtual youtuber, side ponytail, looking at viewer, navel, grey bik ini, ribbon, long hair, parted lips, blue nails, hat, breasts, plaid, hair ribbon, water, arm up, bracelet, star (symbol), cowboy shot, stomach, thigh strap, hair between eyes, beach, small breasts, jewelry, wet, bangs, plaid bikini, nail polish, grey headwear, blue ribbon, adapted costume, choker, ocean, bare shoulders, outdoors, beret
70
+ ![image5](./000043_scribble_concat.webp)
71
+
72
+ prompt: fruit, food, no humans, food focus, cherry, simple background, english text, strawberry, signature, border, artist name, cream
73
+ ![image6](./000067_scribble_concat.webp)
74
+
75
+ prompt: 1girl, solo, ball, swimsuit, bikini, mole, beachball, white bikini, breasts, hairclip, navel, looking at viewer, hair ornament, chromatic aberration, holding, holding ball, pool, cleavage, water, collarbone, mole on breast, blush, bangs, parted lips, bare shoulders, mole on thigh, bare arms, smile, large breasts, blonde hair, halterneck, hair between eyes, stomach
76
+ ![image7](./000092_scribble_concat.webp)
77
+
78
+
79
+
80
+ ## How to Get Started with the Model
81
+
82
+ Use the code below to get started with the model.
83
+
84
+ ```python
85
+
86
+ from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
87
+ from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler
88
+ from controlnet_aux import PidiNetDetector, HEDdetector
89
+ from diffusers.utils import load_image
90
+ from huggingface_hub import HfApi
91
+ from pathlib import Path
92
+ from PIL import Image
93
+ import torch
94
+ import numpy as np
95
+ import cv2
96
+ import os
97
+
98
+
99
+ def nms(x, t, s):
100
+ x = cv2.GaussianBlur(x.astype(np.float32), (0, 0), s)
101
+
102
+ f1 = np.array([[0, 0, 0], [1, 1, 1], [0, 0, 0]], dtype=np.uint8)
103
+ f2 = np.array([[0, 1, 0], [0, 1, 0], [0, 1, 0]], dtype=np.uint8)
104
+ f3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.uint8)
105
+ f4 = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]], dtype=np.uint8)
106
+
107
+ y = np.zeros_like(x)
108
+
109
+ for f in [f1, f2, f3, f4]:
110
+ np.putmask(y, cv2.dilate(x, kernel=f) == x, x)
111
+
112
+ z = np.zeros_like(y, dtype=np.uint8)
113
+ z[y > t] = 255
114
+ return z
115
+
116
+
117
+ controlnet_conditioning_scale = 1.0
118
+ prompt = "your prompt, the longer the better, you can describe it as detail as possible"
119
+ negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'
120
+
121
+
122
+ eulera_scheduler = EulerAncestralDiscreteScheduler.from_pretrained("gsdf/CounterfeitXL", subfolder="scheduler")
123
+
124
+
125
+ controlnet = ControlNetModel.from_pretrained(
126
+ "xinsir/anime-painter",
127
+ torch_dtype=torch.float16
128
+ )
129
+
130
+ # when test with other base model, you need to change the vae also.
131
+ vae = AutoencoderKL.from_pretrained("gsdf/CounterfeitXL", subfolder="vae", torch_dtype=torch.float16)
132
+
133
+ pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
134
+ "gsdf/CounterfeitXL",
135
+ controlnet=controlnet,
136
+ vae=vae,
137
+ safety_checker=None,
138
+ torch_dtype=torch.float16,
139
+ scheduler=eulera_scheduler,
140
+ )
141
+
142
+ # you can use either hed to generate a fake scribble given an image or a sketch image totally draw by yourself
143
+ if random.random() > 0.5:
144
+ # Method 1
145
+ # if you use hed, you should provide an image, the image can be real or anime, you extract its hed lines and use it as the scribbles
146
+ # The detail about hed detect you can refer to https://github.com/lllyasviel/ControlNet/blob/main/gradio_fake_scribble2image.py
147
+ # Below is a example using diffusers HED detector
148
+
149
+ image_path = Image.open("your image path, the image can be real or anime, HED detector will extract its edge boundery")
150
+ processor = HEDdetector.from_pretrained('lllyasviel/Annotators')
151
+ controlnet_img = processor(image_path, scribble=False)
152
+ controlnet_img.save("a hed detect path for an image")
153
+
154
+ # following is some processing to simulate human sketch draw, different threshold can generate different width of lines
155
+ controlnet_img = np.array(controlnet_img)
156
+ controlnet_img = nms(controlnet_img, 127, 3)
157
+ controlnet_img = cv2.GaussianBlur(controlnet_img, (0, 0), 3)
158
+
159
+ # higher threshold, thiner line
160
+ random_val = int(round(random.uniform(0.01, 0.10), 2) * 255)
161
+ controlnet_img[controlnet_img > random_val] = 255
162
+ controlnet_img[controlnet_img < 255] = 0
163
+ controlnet_img = Image.fromarray(controlnet_img)
164
+
165
+ else:
166
+ # Method 2
167
+ # if you use a sketch image total draw by yourself
168
+ control_path = "the sketch image you draw with some tools, like drawing board, the path you save it"
169
+ controlnet_img = Image.open(control_path) # Note that the image must be black-white(0 or 255), like the examples we list
170
+
171
+ # must resize to 1024*1024 or same resolution bucket to get the best performance
172
+ width, height = controlnet_img.size
173
+ ratio = np.sqrt(1024. * 1024. / (width * height))
174
+ new_width, new_height = int(width * ratio), int(height * ratio)
175
+ controlnet_img = controlnet_img.resize((new_width, new_height))
176
+
177
+ images = pipe(
178
+ prompt,
179
+ negative_prompt=negative_prompt,
180
+ image=controlnet_img,
181
+ controlnet_conditioning_scale=controlnet_conditioning_scale,
182
+ width=new_width,
183
+ height=new_height,
184
+ num_inference_steps=30,
185
+ ).images
186
+
187
+ images[0].save(f"your image save path, png format is usually better than jpg or webp in terms of image quality but got much bigger")
188
+ ```
189
+
190
+
191
+ ## Evaluation Data
192
+ The test data is randomly sample from popular wallpaper anime images(pixiv, nijijourney and so on), the purpose of the project is to letting everyone can draw an anime Illustration.
193
+ We select 100 images and generate text with waifu-tagger[https://huggingface.co/spaces/SmilingWolf/wd-tagger] and generate 4 images per prompt, totally 400 images generated, the images
194
+ The images resolution should be 1024 * 1024 or same bucket for SDXL and 512 * 768 or same bucket for SD1.5, we then resize sdxl-generated images to 512 * 768 or same bucket for fair comparison.
195
+ We caculate the Laion Aesthetic Score to measure the beauty and the PerceptualSimilarity to measure the control ability, we find the quality of images have a good consistency with the meric values.
196
+ We compare our methods with other SOTA huggingface models and list the result below. We are the models that have highest aesthectic score, and can generate visually appealing images if you prompt it properly.
197
+
198
+ ## Quantitative Result
199
+ | metric | xinsir/anime-painter | lllyasviel/control_v11p_sd15_scribble |
200
+ |-------|-------|-------|
201
+ | laion_aesthetic | **5.95** | 5.86 |
202
+ | perceptual similarity | **0.5171** | 0.577 |
203
+
204
+ laion_aesthetic(the higher the better)
205
+ perceptual similarity(the lower the better)
206
+
207
+ Note: The values are caculated when save in webp format, when save in png the aesthetic values will increase 0.1-0.3, but the relative relation remains unchanged.
208
+
209
+ ### Conclusion
210
+
211
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
212
+
213
+ In our evaluation, the model got better aesthetic score in anime images compared with lllyasviel/control_v11p_sd15_scribble, we want to compare with other sdxl-1.0-scribble model but find nothing, The model is better in control ability when test with perception similarity due to bigger base model and complex data augmentation.
214
+ Besides, the model has lower rate to generate abnormal images which tend to include some abnormal human structure.
config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ControlNetModel",
3
+ "_diffusers_version": "0.20.0.dev0",
4
+ "act_fn": "silu",
5
+ "addition_embed_type": "text_time",
6
+ "addition_embed_type_num_heads": 64,
7
+ "addition_time_embed_dim": 256,
8
+ "attention_head_dim": [
9
+ 5,
10
+ 10,
11
+ 20
12
+ ],
13
+ "block_out_channels": [
14
+ 320,
15
+ 640,
16
+ 1280
17
+ ],
18
+ "class_embed_type": null,
19
+ "conditioning_channels": 3,
20
+ "conditioning_embedding_out_channels": [
21
+ 16,
22
+ 32,
23
+ 96,
24
+ 256
25
+ ],
26
+ "controlnet_conditioning_channel_order": "rgb",
27
+ "cross_attention_dim": 2048,
28
+ "down_block_types": [
29
+ "DownBlock2D",
30
+ "CrossAttnDownBlock2D",
31
+ "CrossAttnDownBlock2D"
32
+ ],
33
+ "downsample_padding": 1,
34
+ "encoder_hid_dim": null,
35
+ "encoder_hid_dim_type": null,
36
+ "flip_sin_to_cos": true,
37
+ "freq_shift": 0,
38
+ "global_pool_conditions": false,
39
+ "in_channels": 4,
40
+ "layers_per_block": 2,
41
+ "mid_block_scale_factor": 1,
42
+ "norm_eps": 1e-05,
43
+ "norm_num_groups": 32,
44
+ "num_attention_heads": null,
45
+ "num_class_embeds": null,
46
+ "only_cross_attention": false,
47
+ "projection_class_embeddings_input_dim": 2816,
48
+ "resnet_time_scale_shift": "default",
49
+ "transformer_layers_per_block": [
50
+ 1,
51
+ 2,
52
+ 10
53
+ ],
54
+ "upcast_attention": null,
55
+ "use_linear_projection": true
56
+ }
diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:053e9427d3936ac2f1ed766d43cfc119a49a44938aa2e3475c7f01e4a0c476f3
3
+ size 2502139104
masonry_anime.webp ADDED

Git LFS Details

  • SHA256: 96d47069f27edb504f84af000f950ecce12d25241d25e183852758177198e7a6
  • Pointer size: 132 Bytes
  • Size of remote file: 4.05 MB