shekkari21 commited on
Commit
7442006
Β·
1 Parent(s): 358335e

removed few files

Browse files
Files changed (3) hide show
  1. DEPLOYMENT_GUIDE.md +0 -161
  2. README copy.md +0 -235
  3. README.md +235 -13
DEPLOYMENT_GUIDE.md DELETED
@@ -1,161 +0,0 @@
1
- # Hugging Face Space Deployment Guide
2
-
3
- This guide will help you deploy your ResShift Super-Resolution model to Hugging Face Spaces.
4
-
5
- ## Prerequisites
6
-
7
- 1. Hugging Face account (sign up at https://huggingface.co)
8
- 2. Git installed on your machine
9
- 3. Your trained model checkpoint
10
-
11
- ## Step 1: Create a New Space
12
-
13
- 1. Go to https://huggingface.co/spaces
14
- 2. Click **"Create new Space"**
15
- 3. Fill in the details:
16
- - **Space name**: e.g., `resshift-super-resolution`
17
- - **SDK**: Select **"Gradio"**
18
- - **Hardware**: Choose **"GPU"** (recommended for faster inference)
19
- - **Visibility**: Public or Private
20
- 4. Click **"Create Space"**
21
-
22
- ## Step 2: Clone the Space Repository
23
-
24
- After creating the space, Hugging Face will provide you with a Git URL. Clone it:
25
-
26
- ```bash
27
- git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
28
- cd YOUR_SPACE_NAME
29
- ```
30
-
31
- ## Step 3: Copy Required Files
32
-
33
- Copy the following files from your project to the Space repository:
34
-
35
- ### Essential Files:
36
- ```bash
37
- # From your DiffusionSR directory
38
- cp app.py YOUR_SPACE_NAME/
39
- cp requirements.txt YOUR_SPACE_NAME/
40
- cp SPACE_README.md YOUR_SPACE_NAME/README.md
41
-
42
- # Copy source code
43
- cp -r src/ YOUR_SPACE_NAME/
44
-
45
- # Copy model checkpoint
46
- mkdir -p YOUR_SPACE_NAME/checkpoints/ckpts
47
- cp checkpoints/ckpts/model_3200.pth YOUR_SPACE_NAME/checkpoints/ckpts/
48
-
49
- # Copy VQGAN weights
50
- mkdir -p YOUR_SPACE_NAME/pretrained_weights
51
- cp pretrained_weights/autoencoder_vq_f4.pth YOUR_SPACE_NAME/pretrained_weights/
52
- ```
53
-
54
- ### Important Notes:
55
- - **Model Size**: Checkpoints can be large (200-500MB). Hugging Face Spaces supports files up to 10GB.
56
- - **Git LFS**: For large files, you may need Git LFS:
57
- ```bash
58
- git lfs install
59
- git lfs track "*.pth"
60
- git add .gitattributes
61
- ```
62
-
63
- ## Step 4: Update app.py (if needed)
64
-
65
- If your checkpoint path is different, update `app.py`:
66
-
67
- ```python
68
- # In app.py, line ~25, update the checkpoint path:
69
- checkpoint_path = "checkpoints/ckpts/model_3200.pth" # Change to your checkpoint name
70
- ```
71
-
72
- ## Step 5: Commit and Push
73
-
74
- ```bash
75
- cd YOUR_SPACE_NAME
76
- git add .
77
- git commit -m "Initial commit: ResShift Super-Resolution app"
78
- git push
79
- ```
80
-
81
- ## Step 6: Wait for Build
82
-
83
- Hugging Face will automatically:
84
- 1. Install dependencies from `requirements.txt`
85
- 2. Run `app.py`
86
- 3. Make your app available at: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
87
-
88
- The build process usually takes 5-10 minutes.
89
-
90
- ## Step 7: Test Your App
91
-
92
- Once the build completes:
93
- 1. Visit your Space URL
94
- 2. Upload a test image
95
- 3. Verify the super-resolution works correctly
96
-
97
- ## Troubleshooting
98
-
99
- ### Build Fails
100
- - Check the **Logs** tab in your Space for error messages
101
- - Verify all dependencies are in `requirements.txt`
102
- - Ensure file paths are correct
103
-
104
- ### Model Not Loading
105
- - Check that checkpoint path in `app.py` matches your file structure
106
- - Verify checkpoint file was uploaded correctly
107
- - Check logs for specific error messages
108
-
109
- ### Out of Memory
110
- - Reduce batch size in inference
111
- - Use CPU instead of GPU (slower but uses less memory)
112
- - Consider using a smaller model checkpoint
113
-
114
- ### Slow Inference
115
- - Enable GPU in Space settings
116
- - Reduce number of diffusion steps (modify `T` in config)
117
- - Use AMP (automatic mixed precision)
118
-
119
- ## Alternative: Upload via Web Interface
120
-
121
- If you prefer not to use Git:
122
-
123
- 1. Go to your Space page
124
- 2. Click **"Files and versions"** tab
125
- 3. Click **"Add file"** β†’ **"Upload files"**
126
- 4. Upload all required files
127
- 5. The Space will rebuild automatically
128
-
129
- ## Updating Your Space
130
-
131
- To update your Space with new changes:
132
-
133
- ```bash
134
- cd YOUR_SPACE_NAME
135
- # Make your changes
136
- git add .
137
- git commit -m "Update: description of changes"
138
- git push
139
- ```
140
-
141
- ## Sharing Your Space
142
-
143
- Once deployed, you can:
144
- - Share the Space URL with others
145
- - Embed it in websites using iframe
146
- - Use it via API (if enabled)
147
-
148
- ## Next Steps
149
-
150
- 1. **Add Examples**: Add example images to showcase your model
151
- 2. **Improve UI**: Customize the Gradio interface
152
- 3. **Add Documentation**: Update README with more details
153
- 4. **Monitor Usage**: Check Space metrics to see usage
154
-
155
- ## Support
156
-
157
- If you encounter issues:
158
- - Check Hugging Face Spaces documentation: https://huggingface.co/docs/hub/spaces
159
- - Review Space logs for error messages
160
- - Ask for help in Hugging Face forums
161
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README copy.md DELETED
@@ -1,235 +0,0 @@
1
- # DiffusionSR
2
-
3
- A **from-scratch implementation** of the [ResShift](https://arxiv.org/abs/2307.12348) paper: an efficient diffusion-based super-resolution model that uses a U-Net architecture with Swin Transformer blocks to enhance low-resolution images. This implementation combines the power of diffusion models with transformer-based attention mechanisms for high-quality image super-resolution.
4
-
5
- ## Overview
6
-
7
- This project is a complete from-scratch implementation of ResShift, a diffusion model for single image super-resolution (SISR) that efficiently reduces the number of diffusion steps required by shifting the residual between high-resolution and low-resolution images. The model architecture consists of:
8
-
9
- - **Encoder**: 4-stage encoder with residual blocks and time embeddings
10
- - **Bottleneck**: Swin Transformer blocks for global feature modeling
11
- - **Decoder**: 4-stage decoder with skip connections from the encoder
12
- - **Noise Schedule**: ResShift schedule (15 timesteps) for the diffusion process
13
-
14
- ## Features
15
-
16
- - **ResShift Implementation**: Complete from-scratch implementation of the ResShift paper
17
- - **Efficient Diffusion**: Residual shifting mechanism reduces required diffusion steps
18
- - **U-Net Architecture**: Encoder-decoder structure with skip connections
19
- - **Swin Transformer**: Window-based attention mechanism in the bottleneck
20
- - **Time Conditioning**: Sinusoidal time embeddings for diffusion timesteps
21
- - **DIV2K Dataset**: Trained on DIV2K high-quality image dataset
22
- - **Comprehensive Evaluation**: Metrics include PSNR, SSIM, and LPIPS
23
-
24
- ## Requirements
25
-
26
- - Python >= 3.11
27
- - PyTorch >= 2.9.1
28
- - [uv](https://github.com/astral-sh/uv) (Python package manager)
29
-
30
- ## Installation
31
-
32
- ### 1. Clone the Repository
33
-
34
- ```bash
35
- git clone <repository-url>
36
- cd DiffusionSR
37
- ```
38
-
39
- ### 2. Install uv (if not already installed)
40
-
41
- ```bash
42
- # On macOS and Linux
43
- curl -LsSf https://astral.sh/uv/install.sh | sh
44
-
45
- # Or using pip
46
- pip install uv
47
- ```
48
-
49
- ### 3. Create Virtual Environment and Install Dependencies
50
-
51
- ```bash
52
- # Create virtual environment and install dependencies
53
- uv venv
54
-
55
- # Activate the virtual environment
56
- # On macOS/Linux:
57
- source .venv/bin/activate
58
-
59
- # On Windows:
60
- # .venv\Scripts\activate
61
-
62
- # Install project dependencies
63
- uv pip install -e .
64
- ```
65
-
66
- Alternatively, you can use uv's sync command:
67
-
68
- ```bash
69
- uv sync
70
- ```
71
-
72
- ## Dataset Setup
73
-
74
- The model expects the DIV2K dataset in the following structure:
75
-
76
- ```
77
- data/
78
- β”œβ”€β”€ DIV2K_train_HR/ # High-resolution training images
79
- └── DIV2K_train_LR_bicubic/
80
- └── X4/ # Low-resolution images (4x downsampled)
81
- ```
82
-
83
- ### Download DIV2K Dataset
84
-
85
- 1. Download the DIV2K dataset from the [official website](https://data.vision.ee.ethz.ch/cvl/DIV2K/)
86
- 2. Extract the files to the `data/` directory
87
- 3. Ensure the directory structure matches the above
88
-
89
- **Note**: Update the paths in `src/data.py` (lines 75-76) to match your dataset location:
90
-
91
- ```python
92
- train_dataset = SRDataset(
93
- dir_HR = 'path/to/DIV2K_train_HR',
94
- dir_LR = 'path/to/DIV2K_train_LR_bicubic/X4',
95
- scale=4,
96
- patch_size=256
97
- )
98
- ```
99
-
100
- ## Usage
101
-
102
- ### Training
103
-
104
- To train the model, run:
105
-
106
- ```bash
107
- python src/train.py
108
- ```
109
-
110
- The training script will:
111
- - Load the dataset using the `SRDataset` class
112
- - Initialize the `FullUNET` model
113
- - Train using the ResShift noise schedule
114
- - Save training progress and loss values
115
-
116
- ### Training Configuration
117
-
118
- Current training parameters (in `src/train.py`):
119
- - **Batch size**: 4
120
- - **Learning rate**: 1e-4
121
- - **Optimizer**: Adam (betas: 0.9, 0.999)
122
- - **Loss function**: MSE Loss
123
- - **Gradient clipping**: 1.0
124
- - **Training steps**: 150
125
- - **Scale factor**: 4x
126
- - **Patch size**: 256x256
127
-
128
- You can modify these parameters directly in `src/train.py` to suit your needs.
129
-
130
- ### Evaluation
131
-
132
- The model performance is evaluated using the following metrics:
133
-
134
- - **PSNR (Peak Signal-to-Noise Ratio)**: Measures the ratio between the maximum possible power of a signal and the power of corrupting noise. Higher PSNR values indicate better image quality reconstruction.
135
-
136
- - **SSIM (Structural Similarity Index Measure)**: Assesses the similarity between two images based on luminance, contrast, and structure. SSIM values range from -1 to 1, with higher values (closer to 1) indicating greater similarity to the ground truth.
137
-
138
- - **LPIPS (Learned Perceptual Image Patch Similarity)**: Evaluates perceptual similarity between images using deep network features. Lower LPIPS values indicate images that are more perceptually similar to the reference image.
139
-
140
- To run evaluation (once implemented), use:
141
-
142
- ```bash
143
- python src/test.py
144
- ```
145
-
146
- ## Project Structure
147
-
148
- ```
149
- DiffusionSR/
150
- β”œβ”€β”€ data/ # Dataset directory (not tracked in git)
151
- β”‚ β”œβ”€β”€ DIV2K_train_HR/
152
- β”‚ └── DIV2K_train_LR_bicubic/
153
- β”œβ”€β”€ src/
154
- β”‚ β”œβ”€β”€ config.py # Configuration file
155
- β”‚ β”œβ”€β”€ data.py # Dataset class and data loading
156
- β”‚ β”œβ”€β”€ model.py # U-Net model architecture
157
- β”‚ β”œβ”€β”€ noiseControl.py # ResShift noise schedule
158
- β”‚ β”œβ”€β”€ train.py # Training script
159
- β”‚ └── test.py # Testing script (to be implemented)
160
- β”œβ”€β”€ pyproject.toml # Project dependencies and metadata
161
- β”œβ”€β”€ uv.lock # Locked dependency versions
162
- └── README.md # This file
163
- ```
164
-
165
- ## Model Architecture
166
-
167
- ### Encoder
168
- - **Initial Conv**: 3 β†’ 64 channels
169
- - **Stage 1**: 64 β†’ 128 channels, 256Γ—256 β†’ 128Γ—128
170
- - **Stage 2**: 128 β†’ 256 channels, 128Γ—128 β†’ 64Γ—64
171
- - **Stage 3**: 256 β†’ 512 channels, 64Γ—64 β†’ 32Γ—32
172
- - **Stage 4**: 512 channels (no downsampling)
173
-
174
- ### Bottleneck
175
- - Residual blocks with Swin Transformer blocks
176
- - Window size: 7Γ—7
177
- - Shifted window attention for global context
178
-
179
- ### Decoder
180
- - **Stage 1**: 512 β†’ 256 channels, 32Γ—32 β†’ 64Γ—64
181
- - **Stage 2**: 256 β†’ 128 channels, 64Γ—64 β†’ 128Γ—128
182
- - **Stage 3**: 128 β†’ 64 channels, 128Γ—128 β†’ 256Γ—256
183
- - **Stage 4**: 64 β†’ 64 channels
184
- - **Final Conv**: 64 β†’ 3 channels (RGB output)
185
-
186
- ## Key Components
187
-
188
- ### ResShift Noise Schedule
189
- The model implements the ResShift noise schedule as described in the original paper, defined in `src/noiseControl.py`:
190
- - 15 timesteps (0-14)
191
- - Parameters: `eta1=0.001`, `etaT=0.999`, `p=0.8`
192
- - Efficiently shifts the residual between HR and LR images during the diffusion process
193
-
194
- ### Time Embeddings
195
- Sinusoidal embeddings are used to condition the model on diffusion timesteps, similar to positional encodings in transformers.
196
-
197
- ### Data Augmentation
198
- The dataset includes:
199
- - Random cropping (aligned between HR and LR)
200
- - Random horizontal/vertical flips
201
- - Random 180Β° rotation
202
-
203
- ## Development
204
-
205
- ### Adding New Features
206
-
207
- 1. Model modifications: Edit `src/model.py`
208
- 2. Training changes: Modify `src/train.py`
209
- 3. Data pipeline: Update `src/data.py`
210
- 4. Configuration: Add settings to `src/config.py`
211
-
212
- ## License
213
-
214
- [Add your license here]
215
-
216
- ## Citation
217
-
218
- If you use this code in your research, please cite the original ResShift paper:
219
-
220
- ```bibtex
221
- @article{yue2023resshift,
222
- title={ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting},
223
- author={Yue, Zongsheng and Wang, Jianyi and Loy, Chen Change},
224
- journal={arXiv preprint arXiv:2307.12348},
225
- year={2023}
226
- }
227
- ```
228
-
229
- ## Acknowledgments
230
-
231
- - **ResShift Authors**: Zongsheng Yue, Jianyi Wang, and Chen Change Loy for their foundational work on efficient diffusion-based super-resolution
232
- - DIV2K dataset providers
233
- - PyTorch community
234
- - Swin Transformer architecture inspiration
235
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,13 +1,235 @@
1
- ---
2
- title: DiffusionSR
3
- emoji: πŸ“‰
4
- colorFrom: red
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.49.1
8
- app_file: app.py
9
- pinned: false
10
- short_description: Image super resolution through residual diffusion
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DiffusionSR
2
+
3
+ A **from-scratch implementation** of the [ResShift](https://arxiv.org/abs/2307.12348) paper: an efficient diffusion-based super-resolution model that uses a U-Net architecture with Swin Transformer blocks to enhance low-resolution images. This implementation combines the power of diffusion models with transformer-based attention mechanisms for high-quality image super-resolution.
4
+
5
+ ## Overview
6
+
7
+ This project is a complete from-scratch implementation of ResShift, a diffusion model for single image super-resolution (SISR) that efficiently reduces the number of diffusion steps required by shifting the residual between high-resolution and low-resolution images. The model architecture consists of:
8
+
9
+ - **Encoder**: 4-stage encoder with residual blocks and time embeddings
10
+ - **Bottleneck**: Swin Transformer blocks for global feature modeling
11
+ - **Decoder**: 4-stage decoder with skip connections from the encoder
12
+ - **Noise Schedule**: ResShift schedule (15 timesteps) for the diffusion process
13
+
14
+ ## Features
15
+
16
+ - **ResShift Implementation**: Complete from-scratch implementation of the ResShift paper
17
+ - **Efficient Diffusion**: Residual shifting mechanism reduces required diffusion steps
18
+ - **U-Net Architecture**: Encoder-decoder structure with skip connections
19
+ - **Swin Transformer**: Window-based attention mechanism in the bottleneck
20
+ - **Time Conditioning**: Sinusoidal time embeddings for diffusion timesteps
21
+ - **DIV2K Dataset**: Trained on DIV2K high-quality image dataset
22
+ - **Comprehensive Evaluation**: Metrics include PSNR, SSIM, and LPIPS
23
+
24
+ ## Requirements
25
+
26
+ - Python >= 3.11
27
+ - PyTorch >= 2.9.1
28
+ - [uv](https://github.com/astral-sh/uv) (Python package manager)
29
+
30
+ ## Installation
31
+
32
+ ### 1. Clone the Repository
33
+
34
+ ```bash
35
+ git clone <repository-url>
36
+ cd DiffusionSR
37
+ ```
38
+
39
+ ### 2. Install uv (if not already installed)
40
+
41
+ ```bash
42
+ # On macOS and Linux
43
+ curl -LsSf https://astral.sh/uv/install.sh | sh
44
+
45
+ # Or using pip
46
+ pip install uv
47
+ ```
48
+
49
+ ### 3. Create Virtual Environment and Install Dependencies
50
+
51
+ ```bash
52
+ # Create virtual environment and install dependencies
53
+ uv venv
54
+
55
+ # Activate the virtual environment
56
+ # On macOS/Linux:
57
+ source .venv/bin/activate
58
+
59
+ # On Windows:
60
+ # .venv\Scripts\activate
61
+
62
+ # Install project dependencies
63
+ uv pip install -e .
64
+ ```
65
+
66
+ Alternatively, you can use uv's sync command:
67
+
68
+ ```bash
69
+ uv sync
70
+ ```
71
+
72
+ ## Dataset Setup
73
+
74
+ The model expects the DIV2K dataset in the following structure:
75
+
76
+ ```
77
+ data/
78
+ β”œβ”€β”€ DIV2K_train_HR/ # High-resolution training images
79
+ └── DIV2K_train_LR_bicubic/
80
+ └── X4/ # Low-resolution images (4x downsampled)
81
+ ```
82
+
83
+ ### Download DIV2K Dataset
84
+
85
+ 1. Download the DIV2K dataset from the [official website](https://data.vision.ee.ethz.ch/cvl/DIV2K/)
86
+ 2. Extract the files to the `data/` directory
87
+ 3. Ensure the directory structure matches the above
88
+
89
+ **Note**: Update the paths in `src/data.py` (lines 75-76) to match your dataset location:
90
+
91
+ ```python
92
+ train_dataset = SRDataset(
93
+ dir_HR = 'path/to/DIV2K_train_HR',
94
+ dir_LR = 'path/to/DIV2K_train_LR_bicubic/X4',
95
+ scale=4,
96
+ patch_size=256
97
+ )
98
+ ```
99
+
100
+ ## Usage
101
+
102
+ ### Training
103
+
104
+ To train the model, run:
105
+
106
+ ```bash
107
+ python src/train.py
108
+ ```
109
+
110
+ The training script will:
111
+ - Load the dataset using the `SRDataset` class
112
+ - Initialize the `FullUNET` model
113
+ - Train using the ResShift noise schedule
114
+ - Save training progress and loss values
115
+
116
+ ### Training Configuration
117
+
118
+ Current training parameters (in `src/train.py`):
119
+ - **Batch size**: 4
120
+ - **Learning rate**: 1e-4
121
+ - **Optimizer**: Adam (betas: 0.9, 0.999)
122
+ - **Loss function**: MSE Loss
123
+ - **Gradient clipping**: 1.0
124
+ - **Training steps**: 150
125
+ - **Scale factor**: 4x
126
+ - **Patch size**: 256x256
127
+
128
+ You can modify these parameters directly in `src/train.py` to suit your needs.
129
+
130
+ ### Evaluation
131
+
132
+ The model performance is evaluated using the following metrics:
133
+
134
+ - **PSNR (Peak Signal-to-Noise Ratio)**: Measures the ratio between the maximum possible power of a signal and the power of corrupting noise. Higher PSNR values indicate better image quality reconstruction.
135
+
136
+ - **SSIM (Structural Similarity Index Measure)**: Assesses the similarity between two images based on luminance, contrast, and structure. SSIM values range from -1 to 1, with higher values (closer to 1) indicating greater similarity to the ground truth.
137
+
138
+ - **LPIPS (Learned Perceptual Image Patch Similarity)**: Evaluates perceptual similarity between images using deep network features. Lower LPIPS values indicate images that are more perceptually similar to the reference image.
139
+
140
+ To run evaluation (once implemented), use:
141
+
142
+ ```bash
143
+ python src/test.py
144
+ ```
145
+
146
+ ## Project Structure
147
+
148
+ ```
149
+ DiffusionSR/
150
+ β”œβ”€β”€ data/ # Dataset directory (not tracked in git)
151
+ β”‚ β”œβ”€β”€ DIV2K_train_HR/
152
+ β”‚ └── DIV2K_train_LR_bicubic/
153
+ β”œβ”€β”€ src/
154
+ β”‚ β”œβ”€β”€ config.py # Configuration file
155
+ β”‚ β”œβ”€β”€ data.py # Dataset class and data loading
156
+ β”‚ β”œβ”€β”€ model.py # U-Net model architecture
157
+ β”‚ β”œβ”€β”€ noiseControl.py # ResShift noise schedule
158
+ β”‚ β”œβ”€β”€ train.py # Training script
159
+ β”‚ └── test.py # Testing script (to be implemented)
160
+ β”œβ”€β”€ pyproject.toml # Project dependencies and metadata
161
+ β”œβ”€β”€ uv.lock # Locked dependency versions
162
+ └── README.md # This file
163
+ ```
164
+
165
+ ## Model Architecture
166
+
167
+ ### Encoder
168
+ - **Initial Conv**: 3 β†’ 64 channels
169
+ - **Stage 1**: 64 β†’ 128 channels, 256Γ—256 β†’ 128Γ—128
170
+ - **Stage 2**: 128 β†’ 256 channels, 128Γ—128 β†’ 64Γ—64
171
+ - **Stage 3**: 256 β†’ 512 channels, 64Γ—64 β†’ 32Γ—32
172
+ - **Stage 4**: 512 channels (no downsampling)
173
+
174
+ ### Bottleneck
175
+ - Residual blocks with Swin Transformer blocks
176
+ - Window size: 7Γ—7
177
+ - Shifted window attention for global context
178
+
179
+ ### Decoder
180
+ - **Stage 1**: 512 β†’ 256 channels, 32Γ—32 β†’ 64Γ—64
181
+ - **Stage 2**: 256 β†’ 128 channels, 64Γ—64 β†’ 128Γ—128
182
+ - **Stage 3**: 128 β†’ 64 channels, 128Γ—128 β†’ 256Γ—256
183
+ - **Stage 4**: 64 β†’ 64 channels
184
+ - **Final Conv**: 64 β†’ 3 channels (RGB output)
185
+
186
+ ## Key Components
187
+
188
+ ### ResShift Noise Schedule
189
+ The model implements the ResShift noise schedule as described in the original paper, defined in `src/noiseControl.py`:
190
+ - 15 timesteps (0-14)
191
+ - Parameters: `eta1=0.001`, `etaT=0.999`, `p=0.8`
192
+ - Efficiently shifts the residual between HR and LR images during the diffusion process
193
+
194
+ ### Time Embeddings
195
+ Sinusoidal embeddings are used to condition the model on diffusion timesteps, similar to positional encodings in transformers.
196
+
197
+ ### Data Augmentation
198
+ The dataset includes:
199
+ - Random cropping (aligned between HR and LR)
200
+ - Random horizontal/vertical flips
201
+ - Random 180Β° rotation
202
+
203
+ ## Development
204
+
205
+ ### Adding New Features
206
+
207
+ 1. Model modifications: Edit `src/model.py`
208
+ 2. Training changes: Modify `src/train.py`
209
+ 3. Data pipeline: Update `src/data.py`
210
+ 4. Configuration: Add settings to `src/config.py`
211
+
212
+ ## License
213
+
214
+ [Add your license here]
215
+
216
+ ## Citation
217
+
218
+ If you use this code in your research, please cite the original ResShift paper:
219
+
220
+ ```bibtex
221
+ @article{yue2023resshift,
222
+ title={ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting},
223
+ author={Yue, Zongsheng and Wang, Jianyi and Loy, Chen Change},
224
+ journal={arXiv preprint arXiv:2307.12348},
225
+ year={2023}
226
+ }
227
+ ```
228
+
229
+ ## Acknowledgments
230
+
231
+ - **ResShift Authors**: Zongsheng Yue, Jianyi Wang, and Chen Change Loy for their foundational work on efficient diffusion-based super-resolution
232
+ - DIV2K dataset providers
233
+ - PyTorch community
234
+ - Swin Transformer architecture inspiration
235
+