Update README.md
Browse files
README.md
CHANGED
|
@@ -5,6 +5,25 @@ license: apache-2.0
|
|
| 5 |
# SDXL latent to image
|
| 6 |
|
| 7 |
It takes the 4ch latent and decodes it with the [WanDecoder3d module](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers/tree/main/vae).
|
| 8 |
-
The VAE weights (from Wan2.2 5B) have not been modified. However, further training (of both the encoder and the decoder in the AutoencoderKLWan) would improve the sharpness of the image in the t2i tasks.
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
# SDXL latent to image
|
| 6 |
|
| 7 |
It takes the 4ch latent and decodes it with the [WanDecoder3d module](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers/tree/main/vae).
|
|
|
|
| 8 |
|
| 9 |
+
After a short warmup phase, the head of the WanDecoder3d became involved in the process.
|
| 10 |
+
|
| 11 |
+
During the warmup, the model learned the color space. Later on, the imported/modified head improved the stability and the sharpness of the image.
|
| 12 |
+
|
| 13 |
+
```python
|
| 14 |
+
if __name__ == '__main__':
|
| 15 |
+
model = WanXL()
|
| 16 |
+
vae = AutoencoderKLWan.from_pretrained('Wan-AI/Wan2.2-TI2V-5B-Diffusers', subfolder='vae')
|
| 17 |
+
z = torch.randn(1, 4, 128, 128) # (B, C, H, W)
|
| 18 |
+
x = model(z) # (B, C, T, H, W)
|
| 19 |
+
image = transforms.functional.to_pil_image(model.decode_by(vae, x).squeeze())
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
The SDXL latent was generated by this [model](https://huggingface.co/Laxhar/noobai-XL-Vpred-1.0/tree/main/vae).
|
| 23 |
+
|
| 24 |
+
As shown in the example, the target image size is preferably 1024px due to the lossy compression of the original encoded data.
|
| 25 |
+
|
| 26 |
+
## Datasets
|
| 27 |
+
|
| 28 |
+
- 12TPICS
|
| 29 |
+
- jlbaker361/flickr_humans
|