Stripformer TTA Ensembling (Image Deblurring)
This repository provides a PyTorch implementation of an image deblurring model based on the official Stripformer architecture (ECCV 2022), trained on the GoPro Motion Deblurring Dataset.
The released model includes a full Test-Time Augmentation (TTA) ensembling pipeline and post-processing enhancements, improving robustness on real-world motion blur.
📘 Conceptual Walkthrough (Medium Article)
This model is accompanied by a detailed conceptual explanation of modern image deblurring, written to explain how and why neural networks learn to undo blur in a clear, step-by-step, and intuitive manner.
The article walks through the full deblurring pipeline—from feature extraction and spatial reasoning to coarse-to-fine refinement and artifact suppression—and uses Stripformer + TTA as a concrete reference throughout to bridge theory with practice.
👉 Read the full explanation here:
https://medium.com/@varunpatels2004/how-neural-network-learns-to-undo-the-blur-4751bbf86f29
Architecture and References
- Model Architecture: Stripformer (ECCV 2022, official design)
- Paper: https://arxiv.org/abs/2204.04627
- Official GitHub Repository:
https://github.com/pp00704831/Stripformer-ECCV-2022-
The architecture follows an encoder–transformer–decoder design with
strip-based self-attention and residual learning (output + input).
Model Overview
- Task: Image Deblurring (Image-to-Image)
- Framework: PyTorch
- Training Data: GoPro Motion Deblurring Dataset
- Inference: Single-image inference with full TTA
- Checkpoint format: PyTorch
state_dict - License: Apache-2.0
Training Details
- Dataset: GoPro Motion Deblurring Dataset
- Supervision: Paired blurry–sharp images
- Blur Type: Realistic motion blur
- Image Domain: Natural RGB images
The model was trained using the official Stripformer architecture without architectural modification.
Inference Pipeline
The provided inference.py reproduces the exact inference pipeline used
during evaluation.
Test-Time Augmentation (TTA)
The following augmentations are applied:
- Identity
- Horizontal flip
- Vertical flip
- Horizontal + vertical flip
- Transpose
Each prediction is de-augmented and averaged to produce the final output.
Post-Processing
- High-frequency detail enhancement
- Adaptive CLAHE (contrast normalization)
These steps improve perceptual sharpness and stability for strong motion blur.
Evaluation Summary (Limited)
A focused evaluation was conducted on 10 extreme-blur samples.
| Metric | Observation |
|---|---|
| Best base PSNR | ~31.71 dB |
| TTA PSNR gain | +0.14 to +0.33 dB |
| Avg SSIM improvement | +0.0039 |
| Consistency | Improvement on all tested samples |
These results illustrate consistent but modest gains from TTA.
This is not a full benchmark evaluation.
Qualitative Results
The following examples compare:
- Blurred input
- Base Stripformer output
- Stripformer + full TTA
- Ground-truth sharp image
Example 1
Example 2
Repository Structure
| File | Description |
|---|---|
best_model.pth |
Trained model weights (state_dict) |
inference.py |
Full inference pipeline with TTA |
requirements.txt |
Python dependencies |
config.json |
Model metadata |
.gitattributes |
Git LFS configuration |
Installation
pip install -r requirements.txt
- Downloads last month
- 42

