lavinal712
/

transfusion-vae

Model card Files Files and versions

lavinal712 commited on Apr 7, 2025

Commit

799a0ee

·

verified ·

1 Parent(s): e6fc2f4

Update README.md

Files changed (1) hide show

README.md +18 -2

README.md CHANGED Viewed

@@ -23,7 +23,23 @@ This model was trained for 7 epochs on ImageNet, with training parameters follow
 $$\mathcal{L}_{\mathrm{VAE}} = \mathcal{L}_1 + \mathcal{L}_{\mathrm{LPIPS}} + 0.5\mathcal{L}_{\mathrm{GAN}} + 0.2\mathcal{L}_{\mathrm{ID}} + 0.000001\mathcal{L}_{\mathrm{KL}}$$
-## Evaluation
 ImageNet 2012 (256x256, val, 50000 images)
@@ -34,6 +50,6 @@ ImageNet 2012 (256x256, val, 50000 images)
 Paper: [Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model](https://arxiv.org/abs/2408.11039)
-Dataset: [ImageNet](https://image-net.org/)
 Base Code: [lavinal712/AutoencoderKL](https://github.com/lavinal712/AutoencoderKL)

 $$\mathcal{L}_{\mathrm{VAE}} = \mathcal{L}_1 + \mathcal{L}_{\mathrm{LPIPS}} + 0.5\mathcal{L}_{\mathrm{GAN}} + 0.2\mathcal{L}_{\mathrm{ID}} + 0.000001\mathcal{L}_{\mathrm{KL}}$$
+## Evaluation
+ImageNet 2012 (256x256, val, 50000 images)
+| Model           | rFID  | PSNR   | SSIM  | LPIPS |
+|-----------------|-------|--------|-------|-------|
+| Transfusion-VAE | 0.408 | 28.723 | 0.845 | 0.081 |
+| SD-VAE          | 0.692 | 26.910 | 0.772 | 0.130 |
+COCO 2017 (256x256, val, 5000 images)
+| Model           | rFID  | PSNR   | SSIM  | LPIPS |
+|-----------------|-------|--------|-------|-------|
+| Transfusion-VAE | 2.749 | 28.556 | 0.855 | 0.078 |
+| SD-VAE          | 4.246 | 26.622 | 0.784 | 0.127 |
+## Evaluation (legacy)
 ImageNet 2012 (256x256, val, 50000 images)
 Paper: [Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model](https://arxiv.org/abs/2408.11039)
+Dataset: [ImageNet](https://image-net.org/), [COCO](https://cocodataset.org/), [FFHQ](https://github.com/NVlabs/ffhq-dataset)
 Base Code: [lavinal712/AutoencoderKL](https://github.com/lavinal712/AutoencoderKL)