File size: 1,073 Bytes
32898cc
 
4310efc
 
32898cc
 
 
 
 
 
16e73eb
06e5ee5
06b6ff8
06e5ee5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
license: apache-2.0
base_model:
- Wan-AI/Wan2.2-TI2V-5B-Diffusers
---

# SDXL latent to image

It takes the 4ch latent and decodes it with the [WanDecoder3d module](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers/tree/main/vae).

After a short warmup phase, the head of the WanDecoder3d became part of the process.

During the warmup, the model learned the color space. Later on, the imported/modified head improved the stability of the image.

```python
if __name__ == '__main__':
    model = WanXL()
    vae = AutoencoderKLWan.from_pretrained('Wan-AI/Wan2.2-TI2V-5B-Diffusers', subfolder='vae')
    z = torch.randn(1, 4, 128, 128)  # (B, C, H, W)
    x = model(z)  # (B, C, T, H, W)
    image = transforms.functional.to_pil_image(model.decode_by(vae, x).squeeze())
```

The SDXL latent was generated by this [model](https://huggingface.co/Laxhar/noobai-XL-Vpred-1.0/tree/main/vae).

As shown in the example, the target image size is preferably 1024px due to the lossy compression of the original encoded data.

## Datasets

- 12TPICS
- jlbaker361/flickr_humans