|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
tags: |
|
|
- diffusion |
|
|
- autoencoder |
|
|
- feature-space |
|
|
- svg |
|
|
references: |
|
|
- https://arxiv.org/abs/2510.15301 |
|
|
--- |
|
|
|
|
|
# SVG: Latent Diffusion Model without Variational Autoencoder |
|
|
|
|
|
## Model Description |
|
|
|
|
|
SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models. |
|
|
|
|
|
Key features: |
|
|
|
|
|
- Replaces low-dimensional VAE latent space with high-dimensional semantic feature space. |
|
|
- Includes a lightweight residual encoder for refining fine-grained details. |
|
|
- Enables strong generation and perception performance. |
|
|
|
|
|
|
|
|
## How to Use |
|
|
|
|
|
For code, and instructions, see the GitHub repository: |
|
|
|
|
|
[https://github.com/shiml20/SVG](https://github.com/shiml20/SVG) |
|
|
|
|
|
|
|
|
Official project page: |
|
|
|
|
|
[https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/) |
|
|
|
|
|
Arxiv paper: |
|
|
|
|
|
[https://arxiv.org/abs/2510.15301](https://arxiv.org/abs/2510.15301) |