nielsr HF Staff commited on
Commit
8c69a41
·
verified ·
1 Parent(s): c9f46f1

Add pipeline tag and improve model card metadata

Browse files

Hi! I'm Niels from the Hugging Face community team. I've updated the model card to include the `pipeline_tag: image-to-image` metadata, which helps users find the model through the Hub's task filters. I've also refined the metadata and organized the project links for better clarity.

Files changed (1) hide show
  1. README.md +40 -15
README.md CHANGED
@@ -1,39 +1,64 @@
1
  ---
2
  language: en
3
  license: mit
 
4
  tags:
5
  - diffusion
6
  - autoencoder
7
  - feature-space
8
  - svg
9
- references:
10
- - https://arxiv.org/abs/2510.15301
11
  ---
12
 
13
  # SVG: Latent Diffusion Model without Variational Autoencoder
14
 
15
- ## Model Description
16
 
17
- SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.
18
 
19
- Key features:
 
 
20
 
21
- - Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
22
- - Includes a lightweight residual encoder for refining fine-grained details.
23
- - Enables strong generation and perception performance.
24
 
 
25
 
26
- ## How to Use
 
 
 
 
27
 
28
- For code, and instructions, see the GitHub repository:
29
 
30
- [https://github.com/shiml20/SVG](https://github.com/shiml20/SVG)
31
 
 
 
 
 
 
 
32
 
33
- Official project page:
 
 
 
 
 
34
 
35
- [https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/)
36
 
37
- Arxiv paper:
38
 
39
- [https://arxiv.org/abs/2510.15301](https://arxiv.org/abs/2510.15301)
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language: en
3
  license: mit
4
+ pipeline_tag: image-to-image
5
  tags:
6
  - diffusion
7
  - autoencoder
8
  - feature-space
9
  - svg
 
 
10
  ---
11
 
12
  # SVG: Latent Diffusion Model without Variational Autoencoder
13
 
14
+ SVG is a novel latent diffusion model framework that replaces the traditional Variational Autoencoder (VAE) latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based models.
15
 
16
+ ## Resources
17
 
18
+ - **Paper:** [Latent Diffusion Model without Variational Autoencoder](https://huggingface.co/papers/2510.15301)
19
+ - **Project Page:** [https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/)
20
+ - **GitHub Repository:** [https://github.com/shiml20/SVG](https://github.com/shiml20/SVG)
21
 
22
+ ## Model Description
 
 
23
 
24
+ SVG constructs a feature space with clear semantic discriminability by leveraging frozen DINO features, while a lightweight residual branch captures fine-grained details for high-fidelity reconstruction. Diffusion models are trained directly on this semantically structured latent space to facilitate more efficient learning.
25
 
26
+ **Key features:**
27
+ - Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
28
+ - Includes a lightweight residual encoder for refining fine-grained details.
29
+ - Enables accelerated diffusion training and supports few-step sampling.
30
+ - Improves generative quality while preserving semantic and discriminative capabilities.
31
 
32
+ ## Usage
33
 
34
+ For full instructions on training and evaluation, please refer to the official [GitHub repository](https://github.com/shiml20/SVG).
35
 
36
+ ### Installation
37
+ ```bash
38
+ conda create -n svg python=3.10 -y
39
+ conda activate svg
40
+ pip install -r requirements.txt
41
+ ```
42
 
43
+ ### Generation
44
+ To generate images using a trained model:
45
+ ```bash
46
+ # Update ckpt_path in sample_svg.py with your checkpoint
47
+ python sample_svg.py
48
+ ```
49
 
50
+ ## Citation
51
 
52
+ If you find this work useful for your research, please cite:
53
 
54
+ ```bibtex
55
+ @misc{shi2025latentdiffusionmodelvariational,
56
+ title={Latent Diffusion Model without Variational Autoencoder},
57
+ author={Minglei Shi and Haolin Wang and Wenzhao Zheng and Ziyang Yuan and Xiaoshi Wu and Xintao Wang and Pengfei Wan and Jie Zhou and Jiwen Lu},
58
+ year={2025},
59
+ eprint={2510.15301},
60
+ archivePrefix={arXiv},
61
+ primaryClass={cs.CV},
62
+ url={https://arxiv.org/abs/2510.15301},
63
+ }
64
+ ```