Initial commit.

Browse files

Files changed (5) hide show

.gitattributes +37 -0
Gigi_3_512.png +3 -0
Gigi_3_512.png_uplift_sd1.5vae-2.png +3 -0
README.md +129 -0
uplift_sd1.5vae.safetensors +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,37 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+Gigi_3_512.png filter=lfs diff=lfs merge=lfs -text
+Gigi_3_512.png_uplift_sd1.5vae-2.png filter=lfs diff=lfs merge=lfs -text

Gigi_3_512.png ADDED Viewed

Git LFS Details

SHA256: 81d2d3fe5000cd5de8cb8c0ffb9846ee1591ae43120820964a7941022859f12b
Pointer size: 131 Bytes
Size of remote file: 473 kB

Gigi_3_512.png_uplift_sd1.5vae-2.png ADDED Viewed

Git LFS Details

SHA256: e89751686fbe883b4ac88743ed34f5eec918298194bc5e2a1ca873eec4ff3819
Pointer size: 132 Bytes
Size of remote file: 7.35 MB

README.md ADDED Viewed

	@@ -0,0 +1,129 @@

+---
+license: mit
+library_name: pytorch
+tags:
+  - feature-upsampling
+  - pixel-dense-features
+  - computer-vision
+  - stable-diffusion
+  - vae
+  - image-upsampling
+  - uplift
+datasets:
+  - unsplash/lite
+---
+# UPLiFT for Stable Diffusion 1.5 VAE
+| Input Image | UPLiFT Upsampled Output |
+|:-----------:|:-----------------------:|
+| ![Input](Gigi_3_512.png) | ![UPLiFT Output](Gigi_3_512.png_uplift_sd1.5vae-2.png) |
+This is the official pretrained **UPLiFT** (Efficient Pixel-Dense Feature Upsampling with Local Attenders) model for the **Stable Diffusion 1.5 VAE** encoder.
+UPLiFT is a lightweight method to upscale features from pretrained vision backbones to create pixel-dense feature maps. When applied to the SD 1.5 VAE, it enables high-quality image upsampling by operating in the VAE's latent space.
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Backbone** | Stable Diffusion 1.5 VAE (`stable-diffusion-v1-5/stable-diffusion-v1-5`) |
+| **Latent Channels** | 4 |
+| **Patch Size** | 8 |
+| **Upsampling Factor** | 2x per iteration |
+| **Local Attender Size** | N=17 |
+| **Training Dataset** | Unsplash-Lite |
+| **Training Image Size** | 1024x1024 |
+| **License** | MIT |
+## Links
+- **Paper**: [Coming Soon]
+- **GitHub**: [https://github.com/mwalmer-umd/UPLiFT](https://github.com/mwalmer-umd/UPLiFT)
+- **Project Website**: [https://www.cs.umd.edu/~mwalmer/uplift/](https://www.cs.umd.edu/~mwalmer/uplift/)
+## Installation
+```bash
+pip install 'uplift[sd-vae] @ git+https://github.com/mwalmer-umd/UPLiFT.git'
+```
+## Quick Start
+```python
+import torch
+from PIL import Image
+# Load model (weights auto-download from HuggingFace)
+model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_sd15_vae')
+# Run inference - upsamples the image
+image = Image.open('your_image.jpg')
+upsampled_image = model(image)
+```
+## Usage Options
+### Adjust Upsampling Iterations
+Control the number of iterative upsampling steps (default: 2 for VAE):
+```python
+# Fewer iterations = lower memory usage
+model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_sd15_vae', iters=2)
+```
+### Raw UPLiFT Model (Without Backbone)
+Load only the UPLiFT upsampling module without the SD VAE:
+```python
+model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_sd15_vae',
+                       include_extractor=False)
+```
+**Note:** We do not recommend running the model in this way, as the added complexity of extracting and using features from a Diffusers pipeline VAE can introduce errors in feature handling. Running with the backbone included will handle the features correctly.
+## Architecture
+This UPLiFT variant is specifically designed for VAE latent upsampling and includes:
+1. **Encoder**: Processes the input image with a series of convolutional blocks to create dense representations to guide feature upsampling
+2. **Decoder**: Upsamples latent features with noise channel concatenation for stochastic refinement
+3. **Local Attender**: A local-neighborhood-based attention pooling module that maintains semantic consistency with the original features
+4. **Refiner**: An additional 12-layer refinement block with noise injection that enhances output quality
+Key differences from ViT-based UPLiFT models:
+- Uses layer normalization instead of batch normalization
+- Includes noise channel concatenation (4 channels) in decoder and refiner
+- Features a dedicated refiner module for enhanced image quality
+- Trained with latent-space noise augmentation
+## Intended Use
+This model is designed for:
+- High-quality image upsampling using Stable Diffusion's VAE
+- Super-resolution tasks
+- Enhancing image resolution while preserving details
+- Research on diffusion model components
+## Limitations
+- Optimized specifically for Stable Diffusion 1.5 VAE; may not work with other VAE architectures
+- Output quality depends on the input image characteristics
+- Requires more computation than simpler upsampling methods
+- Best results achieved with images that match the training distribution (natural photographs)
+## Citation
+If you use UPLiFT in your research, please cite our paper.
+[citation coming soon]
+## Acknowledgements
+This work builds upon:
+- [Stable Diffusion](https://github.com/CompVis/stable-diffusion) by Stability AI and CompVis
+- [Diffusers](https://github.com/huggingface/diffusers) by Hugging Face
+- [Unsplash](https://unsplash.com/) for the training dataset

uplift_sd1.5vae.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e20bc63c759d36cf43942bdef1b7e248e5874e1af38c7883c806804adffc1cc2
+size 213963468