Update Readme.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,33 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- diffusion
|
| 6 |
+
- autoencoder
|
| 7 |
+
- feature-space
|
| 8 |
+
- svg
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# SVG: Latent Diffusion Model without Variational Autoencoder
|
| 12 |
+
|
| 13 |
+
## Model Description
|
| 14 |
+
|
| 15 |
+
SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.
|
| 16 |
+
|
| 17 |
+
Key features:
|
| 18 |
+
|
| 19 |
+
- Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
|
| 20 |
+
- Includes a lightweight residual encoder for refining fine-grained details.
|
| 21 |
+
- Enables strong generation and perception performance.
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
## How to Use
|
| 25 |
+
|
| 26 |
+
For code, and instructions, see the GitHub repository:
|
| 27 |
+
|
| 28 |
+
[https://github.com/shiml20/SVG](https://github.com/shiml20/SVG)
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
Official project page:
|
| 32 |
+
|
| 33 |
+
[https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/)
|