时情 commited on
Commit ·
34cb4f8
1
Parent(s): c3ca3d6
upload readme
Browse files- README.md +183 -3
- diffusion_pytorch_model.safetensors +0 -3
README.md
CHANGED
|
@@ -1,3 +1,183 @@
|
|
| 1 |
-
--
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div style="display: flex; justify-content: center; align-items: center;">
|
| 2 |
+
<img src="./images/images_alibaba.png" alt="alibaba" style="width: 20%; height: auto; margin-right: 5%;">
|
| 3 |
+
<img src="./images/images_alimama.png" alt="alimama" style="width: 20%; height: auto;">
|
| 4 |
+
</div>
|
| 5 |
+
|
| 6 |
+
EcomID aims to generate customized images from a single reference ID image, ensuring strong semantic consistency while being controlled by keypoints.
|
| 7 |
+
|
| 8 |
+
This repository provides the EcomID method and model, combining the strengths of PuLID and InstantID for better background consistency, facial keypoint control, and realistic facial representation with improved similarity.
|
| 9 |
+
|
| 10 |
+
# EcomID Overview
|
| 11 |
+
|
| 12 |
+
## EcomID Structure
|
| 13 |
+
<img src="./images/overflow.png" alt="alibaba" style="width: 100%; height: auto; margin-right: 5%;">
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
- **IP-Adapter of PuLID**: EcomID incorporates the ID-Encoder and cross-attention components from PuLID, trained with alignment loss.
|
| 17 |
+
This method effectively reduces the interference of ID embeddings on text embeddings within the cross-attention part, minimizing disruption to the underlying model's text-to-image capabilities.
|
| 18 |
+
- **InstantID’s IdentityNet Architecture**: Utilizing **a dataset of 2 million aesthetically pleasing portrait images**, IdentityNet enhances keypoint control, improving ID consistency and facial realism. During training, the IP-adapter is frozen, and only the IdentityNet is trained. Facial landmarks are used as conditional inputs, while face embeddings are integrated into IdentityNet via cross-attention.
|
| 19 |
+
|
| 20 |
+
# Show Cases
|
| 21 |
+
## Comparison with Other Methods
|
| 22 |
+
### 1、Preserved Text-to-Image Capability
|
| 23 |
+
|
| 24 |
+
<table>
|
| 25 |
+
<tr>
|
| 26 |
+
<th style="width: 28%;">Prompt</th>
|
| 27 |
+
<th style="width: 24%;">Reference Image</th>
|
| 28 |
+
<th style="width: 24%;">EcomID</th>
|
| 29 |
+
<th style="width: 24%;">InstantID</th>
|
| 30 |
+
</tr>
|
| 31 |
+
<tr>
|
| 32 |
+
<td>girl, white skin, black hair, long wavy hair, <span style="color:red"><strong>in European style living room, Retro tone, decorations</strong></span>, depth of field.</td>
|
| 33 |
+
<td><img src="images/show_case/50.png" alt="参考图像" width="100%"></td>
|
| 34 |
+
<td><img src="images/show_case/49.png" alt="EcomID图像" width="100%"></td>
|
| 35 |
+
<td><img src="images/show_case/48.png" alt="InstantID图像" width="100%"></td>
|
| 36 |
+
</tr>
|
| 37 |
+
<table>
|
| 38 |
+
|
| 39 |
+
As shown above, EcomID ***preserves background generation abilities while minimizing stylization, greatly enhancing realism***.
|
| 40 |
+
The visualizations highlight more authentic portraits with improved background semantic consistency, showcasing EcomID's advantage in generating realistic images.
|
| 41 |
+
|
| 42 |
+
### 2、Improved Facial Control and Consistency
|
| 43 |
+
<table>
|
| 44 |
+
<tr>
|
| 45 |
+
<th style="width: 24%;">Prompt</th>
|
| 46 |
+
<th style="width: 19%;">Reference Image</th>
|
| 47 |
+
<th style="width: 19%;">EcomID</th>
|
| 48 |
+
<th style="width: 19%;">InstantID</th>
|
| 49 |
+
<th style="width: 19%;">PuLID</th>
|
| 50 |
+
</tr>
|
| 51 |
+
<tr>
|
| 52 |
+
<td>A close-up portrait of a man standing in the library, holding <span style="color:red"><strong>two smiling toddlers</strong></span> next to him.</td>
|
| 53 |
+
<td><img src="images/show_case/20.png" alt="参考图像" width="100%"></td>
|
| 54 |
+
<td><img src="images/show_case/17.png" alt="EcomID图像" width="100%"></td>
|
| 55 |
+
<td><img src="images/show_case/18.png" alt="InstantID图像" width="100%"></td>
|
| 56 |
+
<td><img src="images/show_case/19.png" alt="PuLID图像" width="100%"></td>
|
| 57 |
+
</tr>
|
| 58 |
+
<table>
|
| 59 |
+
|
| 60 |
+
As shown above, EcomID employs keypoints as conditional inputs for training, ***allowing for precise adjustments of facial positions, sizes, and orientations***. This capability ensures that the generated portraits are more controllable while further enhancing facial similarity and the overall quality of the images.
|
| 61 |
+
|
| 62 |
+
### More showcases
|
| 63 |
+
EcomID enhances portrait representation, delivering a more authentic and aesthetically pleasing appearance while ensuring semantic consistency and greater internal ID similarity (i.e., traits that do not vary with age, hairstyle, glasses, or other physical changes).
|
| 64 |
+
|
| 65 |
+
<table>
|
| 66 |
+
<tr>
|
| 67 |
+
<th style="width: 24%;">Prompt</th>
|
| 68 |
+
<th style="width: 19%;">Reference Image</th>
|
| 69 |
+
<th style="width: 19%;">EcomID</th>
|
| 70 |
+
<th style="width: 19%;">InstantID</th>
|
| 71 |
+
<th style="width: 19%;">PuLID</th>
|
| 72 |
+
</tr>
|
| 73 |
+
<tr>
|
| 74 |
+
<td>A close-up portrait of a <span style="color:red"><strong>little girl with double braids</strong></span>, wearing a white dress, standing on the beach during sunset.</td>
|
| 75 |
+
<td><img src="images/show_case/21.png" alt="参考图像" width="100%"></td>
|
| 76 |
+
<td><img src="images/show_case/22.png" alt="EcomID图像" width="100%"></td>
|
| 77 |
+
<td><img src="images/show_case/23.png" alt="InstantID图像" width="100%"></td>
|
| 78 |
+
<td><img src="images/show_case/24.png" alt="PuLID图像" width="100%"></td>
|
| 79 |
+
</tr>
|
| 80 |
+
<tr>
|
| 81 |
+
<td>A close-up portrait of a <span style="color:red"><strong>very little girl</strong></span> with double braids, wearing <span style="color:red"><strong>a hat</strong></span> and white dress, standing on the beach during sunset.</td>
|
| 82 |
+
<td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
|
| 83 |
+
<td><img src="images/show_case/47.png" alt="EcomID图像" width="100%"></td>
|
| 84 |
+
<td><img src="images/show_case/46.png" alt="InstantID图像" width="100%"></td>
|
| 85 |
+
<td><img src="images/show_case/45.png" alt="PuLID图像" width="100%"></td>
|
| 86 |
+
</tr>
|
| 87 |
+
<tr>
|
| 88 |
+
<td>Agrizzled detective, <span style="color:red"><strong>fedora</strong></span> casting a shadow over his square jaw, a <span style="color:red"><strong>cigar dangling from his lips</strong></span>, his trench coat evocative of film noir, in a <span style="color:red"><strong>rainy alley</strong></span>.</td>
|
| 89 |
+
<td><img src="images/show_case/25.png" alt="参考图像" width="100%"></td>
|
| 90 |
+
<td><img src="images/show_case/26.png" alt="EcomID图像" width="100%"></td>
|
| 91 |
+
<td><img src="images/show_case/27.png" alt="InstantID图像" width="100%"></td>
|
| 92 |
+
<td><img src="images/show_case/28.png" alt="PuLID图像" width="100%"></td>
|
| 93 |
+
</tr>
|
| 94 |
+
<tr>
|
| 95 |
+
<td>A smiling girl with <span style="color:red"><strong>bangs and long hair</strong></span> in a school uniform stands under cherry trees, holding a book.</td>
|
| 96 |
+
<td><img src="images/show_case/29.png" alt="参考图像" width="100%"></td>
|
| 97 |
+
<td><img src="images/show_case/30.png" alt="EcomID图像" width="100%"></td>
|
| 98 |
+
<td><img src="images/show_case/31.png" alt="InstantID图像" width="100%"></td>
|
| 99 |
+
<td><img src="images/show_case/32.png" alt="PuLID图像" width="100%"></td>
|
| 100 |
+
</tr>
|
| 101 |
+
<tr>
|
| 102 |
+
<td>A <span style="color:red"><strong>very old</strong></span> witch, wearing a black cloak, with a pointed hat, holding a magic wand, against a background of a misty forest.</td>
|
| 103 |
+
<td><img src="images/show_case/33.png" alt="参考图像" width="100%"></td>
|
| 104 |
+
<td><img src="images/show_case/34.png" alt="EcomID图像" width="100%"></td>
|
| 105 |
+
<td><img src="images/show_case/35.png" alt="InstantID图像" width="100%"></td>
|
| 106 |
+
<td><img src="images/show_case/36.png" alt="PuLID图像" width="100%"></td>
|
| 107 |
+
</tr>
|
| 108 |
+
<tr>
|
| 109 |
+
<td>A man clad in cyberpunk fashion: <span style="color:red"><strong>neon accents, reflective sunglasses,</strong></span> and a leather jacket with glowing circuit patterns. He stands stoically amidst a soaked cityscape.</td>
|
| 110 |
+
<td><img src="images/show_case/37.png" alt="参考图像" width="100%"></td>
|
| 111 |
+
<td><img src="images/show_case/38.png" alt="EcomID图像" width="100%"></td>
|
| 112 |
+
<td><img src="images/show_case/39.png" alt="InstantID图像" width="100%"></td>
|
| 113 |
+
<td><img src="images/show_case/40.png" alt="PuLID图像" width="100%"></td>
|
| 114 |
+
</tr>
|
| 115 |
+
|
| 116 |
+
</table>
|
| 117 |
+
|
| 118 |
+
### More Base Models, Resolutions, and Styles
|
| 119 |
+
<table>
|
| 120 |
+
<tr>
|
| 121 |
+
<th style="width: 12%;">SDXL models</th>
|
| 122 |
+
<th style="width: 24%;">Prompt</th>
|
| 123 |
+
<th style="width: 16%;">Reference Image</th>
|
| 124 |
+
<th style="width: 16%;">EcomID</th>
|
| 125 |
+
<th style="width: 16%;">InstantID</th>
|
| 126 |
+
<th style="width: 16%;">PuLID</th>
|
| 127 |
+
</tr>
|
| 128 |
+
<tr>
|
| 129 |
+
<td>sd-xl-base-1.0</td>
|
| 130 |
+
<td>girl, solo, brown hair, holding a little teddy bear on her hands, wearing a school uniform, standing in the library, <span style="color:red"><strong>cartoon style</strong></span>.</td>
|
| 131 |
+
<td><img src="images/show_case/1.png" alt="参考图像" width="100%"></td>
|
| 132 |
+
<td><img src="images/show_case/2.png" alt="EcomID图像" width="100%"></td>
|
| 133 |
+
<td><img src="images/show_case/3.png" alt="InstantID图像" width="100%"></td>
|
| 134 |
+
<td><img src="images/show_case/4.png" alt="PuLID图像" width="100%"></td>
|
| 135 |
+
</tr>
|
| 136 |
+
<tr>
|
| 137 |
+
<td>EcomXL</td>
|
| 138 |
+
<td>A close-up portrait of a <span style="color:red"><strong>very little girl</strong></span> with double braids, wearing <span style="color:red"><strong>a hat</strong></span> and white dress, standing on the beach during sunset.</td>
|
| 139 |
+
<td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
|
| 140 |
+
<td><img src="images/show_case/47.png" alt="EcomID图像" width="100%"></td>
|
| 141 |
+
<td><img src="images/show_case/46.png" alt="InstantID图像" width="100%"></td>
|
| 142 |
+
<td><img src="images/show_case/45.png" alt="PuLID图像" width="100%"></td>
|
| 143 |
+
</tr>
|
| 144 |
+
<tr>
|
| 145 |
+
<td>DreamShaperXL</td>
|
| 146 |
+
<td>solo, looking_at_viewer, smile, brown_hair, upper_body, open_clothes, teeth, open_jacket, black_jacket, blurry_background, realistic</td>
|
| 147 |
+
<td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
|
| 148 |
+
<td><img src="images/show_case/6.png" alt="EcomID图像" width="100%"></td>
|
| 149 |
+
<td><img src="images/show_case/7.png" alt="InstantID图像" width="100%"></td>
|
| 150 |
+
<td><img src="images/show_case/8.png" alt="PuLID图像" width="100%"></td>
|
| 151 |
+
</tr>
|
| 152 |
+
<tr>
|
| 153 |
+
<td>leosam_xl_v7</td>
|
| 154 |
+
<td>A close-up portrait of a girl, solo, dress, jewelry, beach and sea, pink_dress, realistic.</td>
|
| 155 |
+
<td><img src="images/show_case/9.png" alt="参考图像" width="100%"></td>
|
| 156 |
+
<td><img src="images/show_case/15.png" alt="EcomID图像" width="100%"></td>
|
| 157 |
+
<td><img src="images/show_case/14.png" alt="InstantID图像" width="100%"></td>
|
| 158 |
+
<td><img src="images/show_case/16.png" alt="PuLID图像" width="100%"></td>
|
| 159 |
+
</tr>
|
| 160 |
+
|
| 161 |
+
</table>
|
| 162 |
+
|
| 163 |
+
### Notes
|
| 164 |
+
- Unless otherwise specified, the showcases are generated using the base model EcomXL, which is also highly compatible with various other SDXL-based models, such as [leosams-helloworld-xl](https://civitai.com/models/43977/leosams-helloworld-xl), [dreamshaper-xl](https://civitai.com/models/112902/dreamshaper-xl), [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and so on.
|
| 165 |
+
- It works very well with SDXL Turbo/Lighting, [EcomXL Inpainting ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_inpaint) and [EcomXL Softedge ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_softedge).
|
| 166 |
+
|
| 167 |
+
# How to use
|
| 168 |
+
|
| 169 |
+
## ComfyUI
|
| 170 |
+
|
| 171 |
+
- The EcomID_ComfyUI node has been released: [click here](https://code.alibaba-inc.com/ruxue.wrx/EcomID_ComfyUI)
|
| 172 |
+
|
| 173 |
+
# Training Details
|
| 174 |
+
|
| 175 |
+
The model is trained on 2M Taobao images, where the proportion of human faces is greater than 3%. The images have a resolution greater than 800, and the aesthetic score is above 5.5.
|
| 176 |
+
|
| 177 |
+
Mixed precision: fp16
|
| 178 |
+
|
| 179 |
+
Learning rate: 1e-4
|
| 180 |
+
|
| 181 |
+
Batch size: 2
|
| 182 |
+
|
| 183 |
+
Image size: 1024x1024
|
diffusion_pytorch_model.safetensors
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:0f21e607c9582b1df7a14f6b99597c4a6dda4cb37e4f6675db8985971e4b1cf2
|
| 3 |
-
size 5004167864
|
|
|
|
|
|
|
|
|
|
|
|