howlin
/

SVG

+---
+language: en
+license: mit
+tags:
+- diffusion
+- autoencoder
+- feature-space
+- svg
+---
+# SVG: Latent Diffusion Model without Variational Autoencoder
+## Model Description
+SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.
+Key features:
+- Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
+- Includes a lightweight residual encoder for refining fine-grained details.
+- Enables strong generation and perception performance.
+## How to Use
+For code, and instructions, see the GitHub repository:
+[https://github.com/shiml20/SVG](https://github.com/shiml20/SVG)
+Official project page:
+[https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/)