Spaces:

shekkari21
/

DiffusionSR

Running

App Files Files Community

shekkari21 commited on Nov 18, 2025

Commit

7442006

1 Parent(s): 358335e

removed few files

Browse files

Files changed (3) hide show

DEPLOYMENT_GUIDE.md +0 -161
README copy.md +0 -235
README.md +235 -13

DEPLOYMENT_GUIDE.md DELETED Viewed

@@ -1,161 +0,0 @@
-# Hugging Face Space Deployment Guide
-This guide will help you deploy your ResShift Super-Resolution model to Hugging Face Spaces.
-## Prerequisites
-1. Hugging Face account (sign up at https://huggingface.co)
-2. Git installed on your machine
-3. Your trained model checkpoint
-## Step 1: Create a New Space
-1. Go to https://huggingface.co/spaces
-2. Click **"Create new Space"**
-3. Fill in the details:
-   - **Space name**: e.g., `resshift-super-resolution`
-   - **SDK**: Select **"Gradio"**
-   - **Hardware**: Choose **"GPU"** (recommended for faster inference)
-   - **Visibility**: Public or Private
-4. Click **"Create Space"**
-## Step 2: Clone the Space Repository
-After creating the space, Hugging Face will provide you with a Git URL. Clone it:
-```bash
-git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
-cd YOUR_SPACE_NAME
-```
-## Step 3: Copy Required Files
-Copy the following files from your project to the Space repository:
-### Essential Files:
-```bash
-# From your DiffusionSR directory
-cp app.py YOUR_SPACE_NAME/
-cp requirements.txt YOUR_SPACE_NAME/
-cp SPACE_README.md YOUR_SPACE_NAME/README.md
-# Copy source code
-cp -r src/ YOUR_SPACE_NAME/
-# Copy model checkpoint
-mkdir -p YOUR_SPACE_NAME/checkpoints/ckpts
-cp checkpoints/ckpts/model_3200.pth YOUR_SPACE_NAME/checkpoints/ckpts/
-# Copy VQGAN weights
-mkdir -p YOUR_SPACE_NAME/pretrained_weights
-cp pretrained_weights/autoencoder_vq_f4.pth YOUR_SPACE_NAME/pretrained_weights/
-```
-### Important Notes:
-- **Model Size**: Checkpoints can be large (200-500MB). Hugging Face Spaces supports files up to 10GB.
-- **Git LFS**: For large files, you may need Git LFS:
-  ```bash
-  git lfs install
-  git lfs track "*.pth"
-  git add .gitattributes
-  ```
-## Step 4: Update app.py (if needed)
-If your checkpoint path is different, update `app.py`:
-```python
-# In app.py, line ~25, update the checkpoint path:
-checkpoint_path = "checkpoints/ckpts/model_3200.pth"  # Change to your checkpoint name
-```
-## Step 5: Commit and Push
-```bash
-cd YOUR_SPACE_NAME
-git add .
-git commit -m "Initial commit: ResShift Super-Resolution app"
-git push
-```
-## Step 6: Wait for Build
-Hugging Face will automatically:
-1. Install dependencies from `requirements.txt`
-2. Run `app.py`
-3. Make your app available at: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
-The build process usually takes 5-10 minutes.
-## Step 7: Test Your App
-Once the build completes:
-1. Visit your Space URL
-2. Upload a test image
-3. Verify the super-resolution works correctly
-## Troubleshooting
-### Build Fails
-- Check the **Logs** tab in your Space for error messages
-- Verify all dependencies are in `requirements.txt`
-- Ensure file paths are correct
-### Model Not Loading
-- Check that checkpoint path in `app.py` matches your file structure
-- Verify checkpoint file was uploaded correctly
-- Check logs for specific error messages
-### Out of Memory
-- Reduce batch size in inference
-- Use CPU instead of GPU (slower but uses less memory)
-- Consider using a smaller model checkpoint
-### Slow Inference
-- Enable GPU in Space settings
-- Reduce number of diffusion steps (modify `T` in config)
-- Use AMP (automatic mixed precision)
-## Alternative: Upload via Web Interface
-If you prefer not to use Git:
-1. Go to your Space page
-2. Click **"Files and versions"** tab
-3. Click **"Add file"** → **"Upload files"**
-4. Upload all required files
-5. The Space will rebuild automatically
-## Updating Your Space
-To update your Space with new changes:
-```bash
-cd YOUR_SPACE_NAME
-# Make your changes
-git add .
-git commit -m "Update: description of changes"
-git push
-```
-## Sharing Your Space
-Once deployed, you can:
-- Share the Space URL with others
-- Embed it in websites using iframe
-- Use it via API (if enabled)
-## Next Steps
-1. **Add Examples**: Add example images to showcase your model
-2. **Improve UI**: Customize the Gradio interface
-3. **Add Documentation**: Update README with more details
-4. **Monitor Usage**: Check Space metrics to see usage
-## Support
-If you encounter issues:
-- Check Hugging Face Spaces documentation: https://huggingface.co/docs/hub/spaces
-- Review Space logs for error messages
-- Ask for help in Hugging Face forums

README copy.md DELETED Viewed

@@ -1,235 +0,0 @@
-# DiffusionSR
-A **from-scratch implementation** of the [ResShift](https://arxiv.org/abs/2307.12348) paper: an efficient diffusion-based super-resolution model that uses a U-Net architecture with Swin Transformer blocks to enhance low-resolution images. This implementation combines the power of diffusion models with transformer-based attention mechanisms for high-quality image super-resolution.
-## Overview
-This project is a complete from-scratch implementation of ResShift, a diffusion model for single image super-resolution (SISR) that efficiently reduces the number of diffusion steps required by shifting the residual between high-resolution and low-resolution images. The model architecture consists of:
-- **Encoder**: 4-stage encoder with residual blocks and time embeddings
-- **Bottleneck**: Swin Transformer blocks for global feature modeling
-- **Decoder**: 4-stage decoder with skip connections from the encoder
-- **Noise Schedule**: ResShift schedule (15 timesteps) for the diffusion process
-## Features
-- **ResShift Implementation**: Complete from-scratch implementation of the ResShift paper
-- **Efficient Diffusion**: Residual shifting mechanism reduces required diffusion steps
-- **U-Net Architecture**: Encoder-decoder structure with skip connections
-- **Swin Transformer**: Window-based attention mechanism in the bottleneck
-- **Time Conditioning**: Sinusoidal time embeddings for diffusion timesteps
-- **DIV2K Dataset**: Trained on DIV2K high-quality image dataset
-- **Comprehensive Evaluation**: Metrics include PSNR, SSIM, and LPIPS
-## Requirements
-- Python >= 3.11
-- PyTorch >= 2.9.1
-- [uv](https://github.com/astral-sh/uv) (Python package manager)
-## Installation
-### 1. Clone the Repository
-```bash
-git clone <repository-url>
-cd DiffusionSR
-```
-### 2. Install uv (if not already installed)
-```bash
-# On macOS and Linux
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Or using pip
-pip install uv
-```
-### 3. Create Virtual Environment and Install Dependencies
-```bash
-# Create virtual environment and install dependencies
-uv venv
-# Activate the virtual environment
-# On macOS/Linux:
-source .venv/bin/activate
-# On Windows:
-# .venv\Scripts\activate
-# Install project dependencies
-uv pip install -e .
-```
-Alternatively, you can use uv's sync command:
-```bash
-uv sync
-```
-## Dataset Setup
-The model expects the DIV2K dataset in the following structure:
-```
-data/
-├── DIV2K_train_HR/          # High-resolution training images
-└── DIV2K_train_LR_bicubic/
-    └── X4/                   # Low-resolution images (4x downsampled)
-```
-### Download DIV2K Dataset
-1. Download the DIV2K dataset from the [official website](https://data.vision.ee.ethz.ch/cvl/DIV2K/)
-2. Extract the files to the `data/` directory
-3. Ensure the directory structure matches the above
-**Note**: Update the paths in `src/data.py` (lines 75-76) to match your dataset location:
-```python
-train_dataset = SRDataset(
-    dir_HR = 'path/to/DIV2K_train_HR',
-    dir_LR = 'path/to/DIV2K_train_LR_bicubic/X4',
-    scale=4,
-    patch_size=256
-)
-```
-## Usage
-### Training
-To train the model, run:
-```bash
-python src/train.py
-```
-The training script will:
-- Load the dataset using the `SRDataset` class
-- Initialize the `FullUNET` model
-- Train using the ResShift noise schedule
-- Save training progress and loss values
-### Training Configuration
-Current training parameters (in `src/train.py`):
-- **Batch size**: 4
-- **Learning rate**: 1e-4
-- **Optimizer**: Adam (betas: 0.9, 0.999)
-- **Loss function**: MSE Loss
-- **Gradient clipping**: 1.0
-- **Training steps**: 150
-- **Scale factor**: 4x
-- **Patch size**: 256x256
-You can modify these parameters directly in `src/train.py` to suit your needs.
-### Evaluation
-The model performance is evaluated using the following metrics:
-- **PSNR (Peak Signal-to-Noise Ratio)**: Measures the ratio between the maximum possible power of a signal and the power of corrupting noise. Higher PSNR values indicate better image quality reconstruction.
-- **SSIM (Structural Similarity Index Measure)**: Assesses the similarity between two images based on luminance, contrast, and structure. SSIM values range from -1 to 1, with higher values (closer to 1) indicating greater similarity to the ground truth.
-- **LPIPS (Learned Perceptual Image Patch Similarity)**: Evaluates perceptual similarity between images using deep network features. Lower LPIPS values indicate images that are more perceptually similar to the reference image.
-To run evaluation (once implemented), use:
-```bash
-python src/test.py
-```
-## Project Structure
-```
-DiffusionSR/
-├── data/                      # Dataset directory (not tracked in git)
-│   ├── DIV2K_train_HR/
-│   └── DIV2K_train_LR_bicubic/
-├── src/
-│   ├── config.py             # Configuration file
-│   ├── data.py               # Dataset class and data loading
-│   ├── model.py              # U-Net model architecture
-│   ├── noiseControl.py       # ResShift noise schedule
-│   ├── train.py              # Training script
-│   └── test.py               # Testing script (to be implemented)
-├── pyproject.toml            # Project dependencies and metadata
-├── uv.lock                   # Locked dependency versions
-└── README.md                 # This file
-```
-## Model Architecture
-### Encoder
-- **Initial Conv**: 3 → 64 channels
-- **Stage 1**: 64 → 128 channels, 256×256 → 128×128
-- **Stage 2**: 128 → 256 channels, 128×128 → 64×64
-- **Stage 3**: 256 → 512 channels, 64×64 → 32×32
-- **Stage 4**: 512 channels (no downsampling)
-### Bottleneck
-- Residual blocks with Swin Transformer blocks
-- Window size: 7×7
-- Shifted window attention for global context
-### Decoder
-- **Stage 1**: 512 → 256 channels, 32×32 → 64×64
-- **Stage 2**: 256 → 128 channels, 64×64 → 128×128
-- **Stage 3**: 128 → 64 channels, 128×128 → 256×256
-- **Stage 4**: 64 → 64 channels
-- **Final Conv**: 64 → 3 channels (RGB output)
-## Key Components
-### ResShift Noise Schedule
-The model implements the ResShift noise schedule as described in the original paper, defined in `src/noiseControl.py`:
-- 15 timesteps (0-14)
-- Parameters: `eta1=0.001`, `etaT=0.999`, `p=0.8`
-- Efficiently shifts the residual between HR and LR images during the diffusion process
-### Time Embeddings
-Sinusoidal embeddings are used to condition the model on diffusion timesteps, similar to positional encodings in transformers.
-### Data Augmentation
-The dataset includes:
-- Random cropping (aligned between HR and LR)
-- Random horizontal/vertical flips
-- Random 180° rotation
-## Development
-### Adding New Features
-1. Model modifications: Edit `src/model.py`
-2. Training changes: Modify `src/train.py`
-3. Data pipeline: Update `src/data.py`
-4. Configuration: Add settings to `src/config.py`
-## License
-[Add your license here]
-## Citation
-If you use this code in your research, please cite the original ResShift paper:
-```bibtex
-@article{yue2023resshift,
-  title={ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting},
-  author={Yue, Zongsheng and Wang, Jianyi and Loy, Chen Change},
-  journal={arXiv preprint arXiv:2307.12348},
-  year={2023}
-}
-```
-## Acknowledgments
-- **ResShift Authors**: Zongsheng Yue, Jianyi Wang, and Chen Change Loy for their foundational work on efficient diffusion-based super-resolution
-- DIV2K dataset providers
-- PyTorch community
-- Swin Transformer architecture inspiration

README.md CHANGED Viewed

@@ -1,13 +1,235 @@
----
-title: DiffusionSR
-emoji: 📉
-colorFrom: red
-colorTo: purple
-sdk: gradio
-sdk_version: 5.49.1
-app_file: app.py
-pinned: false
-short_description: Image super resolution through residual diffusion
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# DiffusionSR
+A **from-scratch implementation** of the [ResShift](https://arxiv.org/abs/2307.12348) paper: an efficient diffusion-based super-resolution model that uses a U-Net architecture with Swin Transformer blocks to enhance low-resolution images. This implementation combines the power of diffusion models with transformer-based attention mechanisms for high-quality image super-resolution.
+## Overview
+This project is a complete from-scratch implementation of ResShift, a diffusion model for single image super-resolution (SISR) that efficiently reduces the number of diffusion steps required by shifting the residual between high-resolution and low-resolution images. The model architecture consists of:
+- **Encoder**: 4-stage encoder with residual blocks and time embeddings
+- **Bottleneck**: Swin Transformer blocks for global feature modeling
+- **Decoder**: 4-stage decoder with skip connections from the encoder
+- **Noise Schedule**: ResShift schedule (15 timesteps) for the diffusion process
+## Features
+- **ResShift Implementation**: Complete from-scratch implementation of the ResShift paper
+- **Efficient Diffusion**: Residual shifting mechanism reduces required diffusion steps
+- **U-Net Architecture**: Encoder-decoder structure with skip connections
+- **Swin Transformer**: Window-based attention mechanism in the bottleneck
+- **Time Conditioning**: Sinusoidal time embeddings for diffusion timesteps
+- **DIV2K Dataset**: Trained on DIV2K high-quality image dataset
+- **Comprehensive Evaluation**: Metrics include PSNR, SSIM, and LPIPS
+## Requirements
+- Python >= 3.11
+- PyTorch >= 2.9.1
+- [uv](https://github.com/astral-sh/uv) (Python package manager)
+## Installation
+### 1. Clone the Repository
+```bash
+git clone <repository-url>
+cd DiffusionSR
+```
+### 2. Install uv (if not already installed)
+```bash
+# On macOS and Linux
+curl -LsSf https://astral.sh/uv/install.sh | sh
+# Or using pip
+pip install uv
+```
+### 3. Create Virtual Environment and Install Dependencies
+```bash
+# Create virtual environment and install dependencies
+uv venv
+# Activate the virtual environment
+# On macOS/Linux:
+source .venv/bin/activate
+# On Windows:
+# .venv\Scripts\activate
+# Install project dependencies
+uv pip install -e .
+```
+Alternatively, you can use uv's sync command:
+```bash
+uv sync
+```
+## Dataset Setup
+The model expects the DIV2K dataset in the following structure:
+```
+data/
+├── DIV2K_train_HR/          # High-resolution training images
+└── DIV2K_train_LR_bicubic/
+    └── X4/                   # Low-resolution images (4x downsampled)
+```
+### Download DIV2K Dataset
+1. Download the DIV2K dataset from the [official website](https://data.vision.ee.ethz.ch/cvl/DIV2K/)
+2. Extract the files to the `data/` directory
+3. Ensure the directory structure matches the above
+**Note**: Update the paths in `src/data.py` (lines 75-76) to match your dataset location:
+```python
+train_dataset = SRDataset(
+    dir_HR = 'path/to/DIV2K_train_HR',
+    dir_LR = 'path/to/DIV2K_train_LR_bicubic/X4',
+    scale=4,
+    patch_size=256
+)
+```
+## Usage
+### Training
+To train the model, run:
+```bash
+python src/train.py
+```
+The training script will:
+- Load the dataset using the `SRDataset` class
+- Initialize the `FullUNET` model
+- Train using the ResShift noise schedule
+- Save training progress and loss values
+### Training Configuration
+Current training parameters (in `src/train.py`):
+- **Batch size**: 4
+- **Learning rate**: 1e-4
+- **Optimizer**: Adam (betas: 0.9, 0.999)
+- **Loss function**: MSE Loss
+- **Gradient clipping**: 1.0
+- **Training steps**: 150
+- **Scale factor**: 4x
+- **Patch size**: 256x256
+You can modify these parameters directly in `src/train.py` to suit your needs.
+### Evaluation
+The model performance is evaluated using the following metrics:
+- **PSNR (Peak Signal-to-Noise Ratio)**: Measures the ratio between the maximum possible power of a signal and the power of corrupting noise. Higher PSNR values indicate better image quality reconstruction.
+- **SSIM (Structural Similarity Index Measure)**: Assesses the similarity between two images based on luminance, contrast, and structure. SSIM values range from -1 to 1, with higher values (closer to 1) indicating greater similarity to the ground truth.
+- **LPIPS (Learned Perceptual Image Patch Similarity)**: Evaluates perceptual similarity between images using deep network features. Lower LPIPS values indicate images that are more perceptually similar to the reference image.
+To run evaluation (once implemented), use:
+```bash
+python src/test.py
+```
+## Project Structure
+```
+DiffusionSR/
+├── data/                      # Dataset directory (not tracked in git)
+│   ├── DIV2K_train_HR/
+│   └── DIV2K_train_LR_bicubic/
+├── src/
+│   ├── config.py             # Configuration file
+│   ├── data.py               # Dataset class and data loading
+│   ├── model.py              # U-Net model architecture
+│   ├── noiseControl.py       # ResShift noise schedule
+│   ├── train.py              # Training script
+│   └── test.py               # Testing script (to be implemented)
+├── pyproject.toml            # Project dependencies and metadata
+├── uv.lock                   # Locked dependency versions
+└── README.md                 # This file
+```
+## Model Architecture
+### Encoder
+- **Initial Conv**: 3 → 64 channels
+- **Stage 1**: 64 → 128 channels, 256×256 → 128×128
+- **Stage 2**: 128 → 256 channels, 128×128 → 64×64
+- **Stage 3**: 256 → 512 channels, 64×64 → 32×32
+- **Stage 4**: 512 channels (no downsampling)
+### Bottleneck
+- Residual blocks with Swin Transformer blocks
+- Window size: 7×7
+- Shifted window attention for global context
+### Decoder
+- **Stage 1**: 512 → 256 channels, 32×32 → 64×64
+- **Stage 2**: 256 → 128 channels, 64×64 → 128×128
+- **Stage 3**: 128 → 64 channels, 128×128 → 256×256
+- **Stage 4**: 64 → 64 channels
+- **Final Conv**: 64 → 3 channels (RGB output)
+## Key Components
+### ResShift Noise Schedule
+The model implements the ResShift noise schedule as described in the original paper, defined in `src/noiseControl.py`:
+- 15 timesteps (0-14)
+- Parameters: `eta1=0.001`, `etaT=0.999`, `p=0.8`
+- Efficiently shifts the residual between HR and LR images during the diffusion process
+### Time Embeddings
+Sinusoidal embeddings are used to condition the model on diffusion timesteps, similar to positional encodings in transformers.
+### Data Augmentation
+The dataset includes:
+- Random cropping (aligned between HR and LR)
+- Random horizontal/vertical flips
+- Random 180° rotation
+## Development
+### Adding New Features
+1. Model modifications: Edit `src/model.py`
+2. Training changes: Modify `src/train.py`
+3. Data pipeline: Update `src/data.py`
+4. Configuration: Add settings to `src/config.py`
+## License
+[Add your license here]
+## Citation
+If you use this code in your research, please cite the original ResShift paper:
+```bibtex
+@article{yue2023resshift,
+  title={ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting},
+  author={Yue, Zongsheng and Wang, Jianyi and Loy, Chen Change},
+  journal={arXiv preprint arXiv:2307.12348},
+  year={2023}
+}
+```
+## Acknowledgments
+- **ResShift Authors**: Zongsheng Yue, Jianyi Wang, and Chen Change Loy for their foundational work on efficient diffusion-based super-resolution
+- DIV2K dataset providers
+- PyTorch community
+- Swin Transformer architecture inspiration