--- license: mit library_name: pytorch tags: - image-restoration - multi-frame - deformable-convolution - temporal-fusion - feature-level-alignment - super-resolution - astronomy - satellite-images-refactoring - photography - denoise - deblur - novel-architecture - from-scratch - efficient - research language: - en pipeline_tag: image-to-image --- # MFIR - Multi-Frame Image Restoration A PyTorch model for multi-frame image restoration through temporal fusion and feature-level alignment. MFIR aligns and fuses features from multiple degraded frames to produce a high-quality restored image. ## Model Description MFIR takes 2-16 degraded frames of the same scene and combines them into a single high-quality output. Unlike single-image restoration methods that struggle with heavily degraded inputs, MFIR leverages complementary information across multiple frames - each frame captures slightly different details, and the model learns to extract and merge the best parts from each. ### Architecture ``` Input Frames (B, N, 3, H, W) │ ▼ ┌─────────────────────┐ │ Shared Encoder │ ResNet-style feature extraction └─────────────────────┘ │ ▼ ┌─────────────────────┐ │ Deformable │ Align frames using learned offsets │ Alignment │ (3-layer cascade) └─────────────────────┘ │ ▼ ┌─────────────────────┐ │ Temporal Attention │ Multi-head attention fusion │ Fusion │ (4 heads) └─────────────────────┘ │ ▼ ┌─────────────────────┐ │ Decoder │ PixelShuffle upsampling └─────────────────────┘ │ ▼ Output (B, 3, H, W) ``` ### Key Components | Component | Description | |-----------|-------------| | **Shared Encoder** | Multi-scale feature extraction with residual blocks. 4x spatial downsampling. | | **Deformable Alignment** | Cascaded deformable convolutions (3 layers) to align frames to reference. More robust than optical flow for degraded inputs. | | **Temporal Attention Fusion** | Multi-head attention (4 heads) where reference frame is query, all frames are key/value. Learns per-pixel frame contributions. | | **Decoder** | Progressive upsampling with PixelShuffle (2 stages, 4x total). | ## Usage ### Installation ```bash pip install torch torchvision huggingface_hub ``` ### Inference ```python import torch from huggingface_hub import hf_hub_download # Download checkpoint checkpoint_path = hf_hub_download( repo_id="marduk-ra/MFIR", filename="temporal_fusion_model.pth" ) # Load model ckpt = torch.load(checkpoint_path, map_location="cuda", weights_only=False) # Model architecture code available at: # https://github.com/marduk-ra/MFIR from model import FeatureFusionModel, FeatureFusionConfig config = FeatureFusionConfig.from_dict(ckpt["config"]) model = FeatureFusionModel(config) model.load_state_dict(ckpt["state_dict"]) model.eval() # Inference # frames: (batch, num_frames, 3, height, width) tensor in [0, 1] with torch.no_grad(): result = model(frames, ref_idx=0) output = result["output"] # (batch, 3, height, width) ``` ### Web Demo Try the model directly in your browser: 🚀 **[Hugging Face Space Demo](https://huggingface.co/spaces/marduk-ra/MFIR)** ## Model Details | Parameter | Value | |-----------|-------| | Input Channels | 3 (RGB) | | Output Channels | 3 (RGB) | | Max Frames | 16 | | Min Frames | 2 | | Encoder Channels | [64, 128, 256] | | Deformable Groups | 8 | | Deformable Layers | 3 | | Attention Heads | 4 | | Fusion Type | Attention | | Parameters | ~10M | | Checkpoint Size | 42 MB | ## Example **Input Frames (5 degraded images):**
*5 degraded input frames are fused into a single high-quality output.*
The model works best when:
- Frames have slight variations (different noise patterns, blur, etc.)
- Frames are roughly aligned (same scene)
- Input resolution matches training resolution
## Training
The model was trained on a custom dataset with the following specifications:
**Dataset:**
- 16,000 high-resolution source images
- Each image was used to generate 8 degraded input frames
- Multi-scale training: 128, 256, 512, and 1024 pixel resolutions
**Degradation Pipeline:**
- Random spatial shifts (simulating camera shake)
- Motion blur with varying kernel sizes and directions
- Gaussian and Poisson noise with random intensity
**Training Configuration:**
- Total epochs: 150 (progressive training)
- Optimizer: AdamW
- Loss: L1 + Perceptual (VGG) + SSIM + Color Correction
## Limitations
- Requires multiple frames of the same scene
- Performance depends on frame quality variation
- GPU recommended for real-time processing
## Citation
```bibtex
@software{karaarslan2026mfir,
author = {Karaarslan, Veli},
title = {MFIR: Multi-Frame Image Restoration},
year = {2026},
url = {https://github.com/allcodernet/MFIR}
}
```
## License
MIT License - see [LICENSE](https://github.com/allcodernet/MFIR/blob/main/LICENSE)
## Author
**Veli Karaarslan** - 2026
## Links
- [GitHub Repository](https://github.com/allcodernet/MFIR)
- [Hugging Face Space](https://huggingface.co/spaces/marduk-ra/MFIR)