--- language: en license: mit pipeline_tag: image-to-image tags: - diffusion - autoencoder - feature-space - svg --- # SVG: Latent Diffusion Model without Variational Autoencoder SVG is a novel latent diffusion model framework that replaces the traditional Variational Autoencoder (VAE) latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based models. ## Resources - **Paper:** [Latent Diffusion Model without Variational Autoencoder](https://huggingface.co/papers/2510.15301) - **Project Page:** [https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/) - **GitHub Repository:** [https://github.com/shiml20/SVG](https://github.com/shiml20/SVG) ## Model Description SVG constructs a feature space with clear semantic discriminability by leveraging frozen DINO features, while a lightweight residual branch captures fine-grained details for high-fidelity reconstruction. Diffusion models are trained directly on this semantically structured latent space to facilitate more efficient learning. **Key features:** - Replaces low-dimensional VAE latent space with high-dimensional semantic feature space. - Includes a lightweight residual encoder for refining fine-grained details. - Enables accelerated diffusion training and supports few-step sampling. - Improves generative quality while preserving semantic and discriminative capabilities. ## Usage For full instructions on training and evaluation, please refer to the official [GitHub repository](https://github.com/shiml20/SVG). ### Installation ```bash conda create -n svg python=3.10 -y conda activate svg pip install -r requirements.txt ``` ### Generation To generate images using a trained model: ```bash # Update ckpt_path in sample_svg.py with your checkpoint python sample_svg.py ``` ## Citation If you find this work useful for your research, please cite: ```bibtex @misc{shi2025latentdiffusionmodelvariational, title={Latent Diffusion Model without Variational Autoencoder}, author={Minglei Shi and Haolin Wang and Wenzhao Zheng and Ziyang Yuan and Xiaoshi Wu and Xintao Wang and Pengfei Wan and Jie Zhou and Jiwen Lu}, year={2025}, eprint={2510.15301}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2510.15301}, } ```