File size: 1,396 Bytes
5a535e7
 
 
63d9ac9
5a535e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# DC-AE-Lite
\[[github](https://github.com/dc-ai-projects/DC-Gen/tree/main)\]

Decoding is often the speed bottleneck in few-step latent diffusion models. We release DC-AE-Lite to resolve this problem. It has the same encoder of DC-AE-f32c32-SANA-1.0 while having a much smaller decoder. Without training, it can be applied to diffusion model trained with DC-AE-f32c32-SANA-1.0.

## Demo
<p align="center">
  <img src="./assets/combined.gif"><br>
  <b> DC-AE-Lite vs DC-AE reconstruction visual quality </b>
</p>

<p align="center">
  <img src="./assets/dc-ae-lite.jpg"><br>
  <b> DC-AE-Lite achieves 1.8× faster decoding than DC-AE with similar reconstruction quality </b>
</p>



# Usage
```bash
from diffusers import AutoencoderDC
from PIL import Image
import torch
import torchvision.transforms as transforms
from torchvision.utils import save_image

device = torch.device("cuda")
dc_ae_lite = AutoencoderDC.from_pretrained("dc-ai/dc-ae-lite-f32c32-diffusers").to(device).eval()

transform = transforms.Compose([
    transforms.CenterCrop((1024,1024)),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

image = Image.open("assets/fig/girl.png")

x = transform(image)[None].to(device)
latent = dc_ae_lite.encode(x).latent
print(f"latent shape: {latent.shape}")

y = dc_ae_lite.decode(latent).sample
save_image(y * 0.5 + 0.5, "demo_dc_ae_lite.png")
```