AniAggarwal commited on
Commit
75e41de
·
verified ·
0 Parent(s):

Initial commit.

Browse files
.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Gigi_3_512.png filter=lfs diff=lfs merge=lfs -text
37
+ Gigi_3_512.png_uplift_sd1.5vae-2.png filter=lfs diff=lfs merge=lfs -text
Gigi_3_512.png ADDED

Git LFS Details

  • SHA256: 81d2d3fe5000cd5de8cb8c0ffb9846ee1591ae43120820964a7941022859f12b
  • Pointer size: 131 Bytes
  • Size of remote file: 473 kB
Gigi_3_512.png_uplift_sd1.5vae-2.png ADDED

Git LFS Details

  • SHA256: e89751686fbe883b4ac88743ed34f5eec918298194bc5e2a1ca873eec4ff3819
  • Pointer size: 132 Bytes
  • Size of remote file: 7.35 MB
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - feature-upsampling
6
+ - pixel-dense-features
7
+ - computer-vision
8
+ - stable-diffusion
9
+ - vae
10
+ - image-upsampling
11
+ - uplift
12
+ datasets:
13
+ - unsplash/lite
14
+ ---
15
+
16
+ # UPLiFT for Stable Diffusion 1.5 VAE
17
+
18
+ | Input Image | UPLiFT Upsampled Output |
19
+ |:-----------:|:-----------------------:|
20
+ | ![Input](Gigi_3_512.png) | ![UPLiFT Output](Gigi_3_512.png_uplift_sd1.5vae-2.png) |
21
+
22
+ This is the official pretrained **UPLiFT** (Efficient Pixel-Dense Feature Upsampling with Local Attenders) model for the **Stable Diffusion 1.5 VAE** encoder.
23
+
24
+ UPLiFT is a lightweight method to upscale features from pretrained vision backbones to create pixel-dense feature maps. When applied to the SD 1.5 VAE, it enables high-quality image upsampling by operating in the VAE's latent space.
25
+
26
+ ## Model Details
27
+
28
+ | Property | Value |
29
+ |----------|-------|
30
+ | **Backbone** | Stable Diffusion 1.5 VAE (`stable-diffusion-v1-5/stable-diffusion-v1-5`) |
31
+ | **Latent Channels** | 4 |
32
+ | **Patch Size** | 8 |
33
+ | **Upsampling Factor** | 2x per iteration |
34
+ | **Local Attender Size** | N=17 |
35
+ | **Training Dataset** | Unsplash-Lite |
36
+ | **Training Image Size** | 1024x1024 |
37
+ | **License** | MIT |
38
+
39
+ ## Links
40
+
41
+ - **Paper**: [Coming Soon]
42
+ - **GitHub**: [https://github.com/mwalmer-umd/UPLiFT](https://github.com/mwalmer-umd/UPLiFT)
43
+ - **Project Website**: [https://www.cs.umd.edu/~mwalmer/uplift/](https://www.cs.umd.edu/~mwalmer/uplift/)
44
+
45
+ ## Installation
46
+
47
+ ```bash
48
+ pip install 'uplift[sd-vae] @ git+https://github.com/mwalmer-umd/UPLiFT.git'
49
+ ```
50
+
51
+ ## Quick Start
52
+
53
+ ```python
54
+ import torch
55
+ from PIL import Image
56
+
57
+ # Load model (weights auto-download from HuggingFace)
58
+ model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_sd15_vae')
59
+
60
+ # Run inference - upsamples the image
61
+ image = Image.open('your_image.jpg')
62
+ upsampled_image = model(image)
63
+ ```
64
+
65
+ ## Usage Options
66
+
67
+ ### Adjust Upsampling Iterations
68
+
69
+ Control the number of iterative upsampling steps (default: 2 for VAE):
70
+
71
+ ```python
72
+ # Fewer iterations = lower memory usage
73
+ model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_sd15_vae', iters=2)
74
+ ```
75
+
76
+ ### Raw UPLiFT Model (Without Backbone)
77
+
78
+ Load only the UPLiFT upsampling module without the SD VAE:
79
+
80
+ ```python
81
+ model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_sd15_vae',
82
+ include_extractor=False)
83
+ ```
84
+
85
+ **Note:** We do not recommend running the model in this way, as the added complexity of extracting and using features from a Diffusers pipeline VAE can introduce errors in feature handling. Running with the backbone included will handle the features correctly.
86
+
87
+ ## Architecture
88
+
89
+ This UPLiFT variant is specifically designed for VAE latent upsampling and includes:
90
+
91
+ 1. **Encoder**: Processes the input image with a series of convolutional blocks to create dense representations to guide feature upsampling
92
+ 2. **Decoder**: Upsamples latent features with noise channel concatenation for stochastic refinement
93
+ 3. **Local Attender**: A local-neighborhood-based attention pooling module that maintains semantic consistency with the original features
94
+ 4. **Refiner**: An additional 12-layer refinement block with noise injection that enhances output quality
95
+
96
+ Key differences from ViT-based UPLiFT models:
97
+ - Uses layer normalization instead of batch normalization
98
+ - Includes noise channel concatenation (4 channels) in decoder and refiner
99
+ - Features a dedicated refiner module for enhanced image quality
100
+ - Trained with latent-space noise augmentation
101
+
102
+ ## Intended Use
103
+
104
+ This model is designed for:
105
+
106
+ - High-quality image upsampling using Stable Diffusion's VAE
107
+ - Super-resolution tasks
108
+ - Enhancing image resolution while preserving details
109
+ - Research on diffusion model components
110
+
111
+ ## Limitations
112
+
113
+ - Optimized specifically for Stable Diffusion 1.5 VAE; may not work with other VAE architectures
114
+ - Output quality depends on the input image characteristics
115
+ - Requires more computation than simpler upsampling methods
116
+ - Best results achieved with images that match the training distribution (natural photographs)
117
+
118
+ ## Citation
119
+
120
+ If you use UPLiFT in your research, please cite our paper.
121
+
122
+ [citation coming soon]
123
+
124
+ ## Acknowledgements
125
+
126
+ This work builds upon:
127
+ - [Stable Diffusion](https://github.com/CompVis/stable-diffusion) by Stability AI and CompVis
128
+ - [Diffusers](https://github.com/huggingface/diffusers) by Hugging Face
129
+ - [Unsplash](https://unsplash.com/) for the training dataset
uplift_sd1.5vae.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e20bc63c759d36cf43942bdef1b7e248e5874e1af38c7883c806804adffc1cc2
3
+ size 213963468