Update README.md
Browse files
README.md
CHANGED
|
@@ -23,7 +23,23 @@ This model was trained for 7 epochs on ImageNet, with training parameters follow
|
|
| 23 |
|
| 24 |
$$\mathcal{L}_{\mathrm{VAE}} = \mathcal{L}_1 + \mathcal{L}_{\mathrm{LPIPS}} + 0.5\mathcal{L}_{\mathrm{GAN}} + 0.2\mathcal{L}_{\mathrm{ID}} + 0.000001\mathcal{L}_{\mathrm{KL}}$$
|
| 25 |
|
| 26 |
-
## Evaluation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
ImageNet 2012 (256x256, val, 50000 images)
|
| 29 |
|
|
@@ -34,6 +50,6 @@ ImageNet 2012 (256x256, val, 50000 images)
|
|
| 34 |
|
| 35 |
Paper: [Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model](https://arxiv.org/abs/2408.11039)
|
| 36 |
|
| 37 |
-
Dataset: [ImageNet](https://image-net.org/)
|
| 38 |
|
| 39 |
Base Code: [lavinal712/AutoencoderKL](https://github.com/lavinal712/AutoencoderKL)
|
|
|
|
| 23 |
|
| 24 |
$$\mathcal{L}_{\mathrm{VAE}} = \mathcal{L}_1 + \mathcal{L}_{\mathrm{LPIPS}} + 0.5\mathcal{L}_{\mathrm{GAN}} + 0.2\mathcal{L}_{\mathrm{ID}} + 0.000001\mathcal{L}_{\mathrm{KL}}$$
|
| 25 |
|
| 26 |
+
## Evaluation
|
| 27 |
+
|
| 28 |
+
ImageNet 2012 (256x256, val, 50000 images)
|
| 29 |
+
|
| 30 |
+
| Model | rFID | PSNR | SSIM | LPIPS |
|
| 31 |
+
|-----------------|-------|--------|-------|-------|
|
| 32 |
+
| Transfusion-VAE | 0.408 | 28.723 | 0.845 | 0.081 |
|
| 33 |
+
| SD-VAE | 0.692 | 26.910 | 0.772 | 0.130 |
|
| 34 |
+
|
| 35 |
+
COCO 2017 (256x256, val, 5000 images)
|
| 36 |
+
|
| 37 |
+
| Model | rFID | PSNR | SSIM | LPIPS |
|
| 38 |
+
|-----------------|-------|--------|-------|-------|
|
| 39 |
+
| Transfusion-VAE | 2.749 | 28.556 | 0.855 | 0.078 |
|
| 40 |
+
| SD-VAE | 4.246 | 26.622 | 0.784 | 0.127 |
|
| 41 |
+
|
| 42 |
+
## Evaluation (legacy)
|
| 43 |
|
| 44 |
ImageNet 2012 (256x256, val, 50000 images)
|
| 45 |
|
|
|
|
| 50 |
|
| 51 |
Paper: [Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model](https://arxiv.org/abs/2408.11039)
|
| 52 |
|
| 53 |
+
Dataset: [ImageNet](https://image-net.org/), [COCO](https://cocodataset.org/), [FFHQ](https://github.com/NVlabs/ffhq-dataset)
|
| 54 |
|
| 55 |
Base Code: [lavinal712/AutoencoderKL](https://github.com/lavinal712/AutoencoderKL)
|