| | --- |
| | license: mit |
| | library_name: pytorch |
| | tags: |
| | - image-restoration |
| | - multi-frame |
| | - deformable-convolution |
| | - temporal-fusion |
| | - feature-level-alignment |
| | - super-resolution |
| | - astronomy |
| | - satellite-images-refactoring |
| | - photography |
| | - denoise |
| | - deblur |
| | - novel-architecture |
| | - from-scratch |
| | - efficient |
| | - research |
| | language: |
| | - en |
| | pipeline_tag: image-to-image |
| | --- |
| | |
| | # MFIR - Multi-Frame Image Restoration |
| |
|
| | A PyTorch model for multi-frame image restoration through temporal fusion and feature-level alignment. MFIR aligns and fuses features from multiple degraded frames to produce a high-quality restored image. |
| |
|
| | ## Model Description |
| |
|
| | MFIR takes 2-16 degraded frames of the same scene and combines them into a single high-quality output. Unlike single-image restoration methods that struggle with heavily degraded inputs, MFIR leverages complementary information across multiple frames - each frame captures slightly different details, and the model learns to extract and merge the best parts from each. |
| |
|
| | ### Architecture |
| |
|
| | ``` |
| | Input Frames (B, N, 3, H, W) |
| | β |
| | βΌ |
| | βββββββββββββββββββββββ |
| | β Shared Encoder β ResNet-style feature extraction |
| | βββββββββββββββββββββββ |
| | β |
| | βΌ |
| | βββββββββββββββββββββββ |
| | β Deformable β Align frames using learned offsets |
| | β Alignment β (3-layer cascade) |
| | βββββββββββββββββββββββ |
| | β |
| | βΌ |
| | βββββββββββββββββββββββ |
| | β Temporal Attention β Multi-head attention fusion |
| | β Fusion β (4 heads) |
| | βββββββββββββββββββββββ |
| | β |
| | βΌ |
| | βββββββββββββββββββββββ |
| | β Decoder β PixelShuffle upsampling |
| | βββββββββββββββββββββββ |
| | β |
| | βΌ |
| | Output (B, 3, H, W) |
| | ``` |
| |
|
| | ### Key Components |
| |
|
| | | Component | Description | |
| | |-----------|-------------| |
| | | **Shared Encoder** | Multi-scale feature extraction with residual blocks. 4x spatial downsampling. | |
| | | **Deformable Alignment** | Cascaded deformable convolutions (3 layers) to align frames to reference. More robust than optical flow for degraded inputs. | |
| | | **Temporal Attention Fusion** | Multi-head attention (4 heads) where reference frame is query, all frames are key/value. Learns per-pixel frame contributions. | |
| | | **Decoder** | Progressive upsampling with PixelShuffle (2 stages, 4x total). | |
| |
|
| | ## Usage |
| |
|
| | ### Installation |
| |
|
| | ```bash |
| | pip install torch torchvision huggingface_hub |
| | ``` |
| |
|
| | ### Inference |
| |
|
| | ```python |
| | import torch |
| | from huggingface_hub import hf_hub_download |
| | |
| | # Download checkpoint |
| | checkpoint_path = hf_hub_download( |
| | repo_id="marduk-ra/MFIR", |
| | filename="temporal_fusion_model.pth" |
| | ) |
| | |
| | # Load model |
| | ckpt = torch.load(checkpoint_path, map_location="cuda", weights_only=False) |
| | |
| | # Model architecture code available at: |
| | # https://github.com/marduk-ra/MFIR |
| | |
| | from model import FeatureFusionModel, FeatureFusionConfig |
| | |
| | config = FeatureFusionConfig.from_dict(ckpt["config"]) |
| | model = FeatureFusionModel(config) |
| | model.load_state_dict(ckpt["state_dict"]) |
| | model.eval() |
| | |
| | # Inference |
| | # frames: (batch, num_frames, 3, height, width) tensor in [0, 1] |
| | with torch.no_grad(): |
| | result = model(frames, ref_idx=0) |
| | output = result["output"] # (batch, 3, height, width) |
| | ``` |
| |
|
| | ### Web Demo |
| |
|
| | Try the model directly in your browser: |
| |
|
| | π **[Hugging Face Space Demo](https://huggingface.co/spaces/marduk-ra/MFIR)** |
| |
|
| | ## Model Details |
| |
|
| | | Parameter | Value | |
| | |-----------|-------| |
| | | Input Channels | 3 (RGB) | |
| | | Output Channels | 3 (RGB) | |
| | | Max Frames | 16 | |
| | | Min Frames | 2 | |
| | | Encoder Channels | [64, 128, 256] | |
| | | Deformable Groups | 8 | |
| | | Deformable Layers | 3 | |
| | | Attention Heads | 4 | |
| | | Fusion Type | Attention | |
| | | Parameters | ~10M | |
| | | Checkpoint Size | 42 MB | |
| |
|
| | ## Example |
| |
|
| | **Input Frames (5 degraded images):** |
| |
|
| | <p> |
| | <img src="photos/inputs/input1.png" width="150" /> |
| | <img src="photos/inputs/input2.png" width="150" /> |
| | <img src="photos/inputs/input3.png" width="150" /> |
| | <img src="photos/inputs/input4.png" width="150" /> |
| | <img src="photos/inputs/input5.png" width="150" /> |
| | </p> |
| |
|
| | **Output (restored):** |
| |
|
| | <img src="photos/output.png" width="400" /> |
| |
|
| | *5 degraded input frames are fused into a single high-quality output.* |
| |
|
| | The model works best when: |
| | - Frames have slight variations (different noise patterns, blur, etc.) |
| | - Frames are roughly aligned (same scene) |
| | - Input resolution matches training resolution |
| |
|
| | ## Training |
| |
|
| | The model was trained on a custom dataset with the following specifications: |
| |
|
| | **Dataset:** |
| | - 16,000 high-resolution source images |
| | - Each image was used to generate 8 degraded input frames |
| | - Multi-scale training: 128, 256, 512, and 1024 pixel resolutions |
| |
|
| | **Degradation Pipeline:** |
| | - Random spatial shifts (simulating camera shake) |
| | - Motion blur with varying kernel sizes and directions |
| | - Gaussian and Poisson noise with random intensity |
| |
|
| | **Training Configuration:** |
| | - Total epochs: 150 (progressive training) |
| | - Optimizer: AdamW |
| | - Loss: L1 + Perceptual (VGG) + SSIM + Color Correction |
| |
|
| | ## Limitations |
| |
|
| | - Requires multiple frames of the same scene |
| | - Performance depends on frame quality variation |
| | - GPU recommended for real-time processing |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @software{karaarslan2026mfir, |
| | author = {Karaarslan, Veli}, |
| | title = {MFIR: Multi-Frame Image Restoration}, |
| | year = {2026}, |
| | url = {https://github.com/allcodernet/MFIR} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | MIT License - see [LICENSE](https://github.com/allcodernet/MFIR/blob/main/LICENSE) |
| |
|
| | ## Author |
| |
|
| | **Veli Karaarslan** - 2026 |
| |
|
| | ## Links |
| |
|
| | - [GitHub Repository](https://github.com/allcodernet/MFIR) |
| | - [Hugging Face Space](https://huggingface.co/spaces/marduk-ra/MFIR) |
| |
|