SVG / README.md
howlin's picture
Update README.md
c9f46f1 verified
---
language: en
license: mit
tags:
- diffusion
- autoencoder
- feature-space
- svg
references:
- https://arxiv.org/abs/2510.15301
---
# SVG: Latent Diffusion Model without Variational Autoencoder
## Model Description
SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.
Key features:
- Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
- Includes a lightweight residual encoder for refining fine-grained details.
- Enables strong generation and perception performance.
## How to Use
For code, and instructions, see the GitHub repository:
[https://github.com/shiml20/SVG](https://github.com/shiml20/SVG)
Official project page:
[https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/)
Arxiv paper:
[https://arxiv.org/abs/2510.15301](https://arxiv.org/abs/2510.15301)