--- license: mit library_name: pytorch tags: - image-restoration - multi-frame - deformable-convolution - temporal-fusion - feature-level-alignment - super-resolution - astronomy - satellite-images-refactoring - photography - denoise - deblur - novel-architecture - from-scratch - efficient - research language: - en pipeline_tag: image-to-image --- # MFIR - Multi-Frame Image Restoration A PyTorch model for multi-frame image restoration through temporal fusion and feature-level alignment. MFIR aligns and fuses features from multiple degraded frames to produce a high-quality restored image. ## Model Description MFIR takes 2-16 degraded frames of the same scene and combines them into a single high-quality output. Unlike single-image restoration methods that struggle with heavily degraded inputs, MFIR leverages complementary information across multiple frames - each frame captures slightly different details, and the model learns to extract and merge the best parts from each. ### Architecture ``` Input Frames (B, N, 3, H, W) │ ▼ ┌─────────────────────┐ │ Shared Encoder │ ResNet-style feature extraction └─────────────────────┘ │ ▼ ┌─────────────────────┐ │ Deformable │ Align frames using learned offsets │ Alignment │ (3-layer cascade) └─────────────────────┘ │ ▼ ┌─────────────────────┐ │ Temporal Attention │ Multi-head attention fusion │ Fusion │ (4 heads) └─────────────────────┘ │ ▼ ┌─────────────────────┐ │ Decoder │ PixelShuffle upsampling └─────────────────────┘ │ ▼ Output (B, 3, H, W) ``` ### Key Components | Component | Description | |-----------|-------------| | **Shared Encoder** | Multi-scale feature extraction with residual blocks. 4x spatial downsampling. | | **Deformable Alignment** | Cascaded deformable convolutions (3 layers) to align frames to reference. More robust than optical flow for degraded inputs. | | **Temporal Attention Fusion** | Multi-head attention (4 heads) where reference frame is query, all frames are key/value. Learns per-pixel frame contributions. | | **Decoder** | Progressive upsampling with PixelShuffle (2 stages, 4x total). | ## Usage ### Installation ```bash pip install torch torchvision huggingface_hub ``` ### Inference ```python import torch from huggingface_hub import hf_hub_download # Download checkpoint checkpoint_path = hf_hub_download( repo_id="marduk-ra/MFIR", filename="temporal_fusion_model.pth" ) # Load model ckpt = torch.load(checkpoint_path, map_location="cuda", weights_only=False) # Model architecture code available at: # https://github.com/marduk-ra/MFIR from model import FeatureFusionModel, FeatureFusionConfig config = FeatureFusionConfig.from_dict(ckpt["config"]) model = FeatureFusionModel(config) model.load_state_dict(ckpt["state_dict"]) model.eval() # Inference # frames: (batch, num_frames, 3, height, width) tensor in [0, 1] with torch.no_grad(): result = model(frames, ref_idx=0) output = result["output"] # (batch, 3, height, width) ``` ### Web Demo Try the model directly in your browser: 🚀 **[Hugging Face Space Demo](https://huggingface.co/spaces/marduk-ra/MFIR)** ## Model Details | Parameter | Value | |-----------|-------| | Input Channels | 3 (RGB) | | Output Channels | 3 (RGB) | | Max Frames | 16 | | Min Frames | 2 | | Encoder Channels | [64, 128, 256] | | Deformable Groups | 8 | | Deformable Layers | 3 | | Attention Heads | 4 | | Fusion Type | Attention | | Parameters | ~10M | | Checkpoint Size | 42 MB | ## Example **Input Frames (5 degraded images):**

**Output (restored):** *5 degraded input frames are fused into a single high-quality output.* The model works best when: - Frames have slight variations (different noise patterns, blur, etc.) - Frames are roughly aligned (same scene) - Input resolution matches training resolution ## Training The model was trained on a custom dataset with the following specifications: **Dataset:** - 16,000 high-resolution source images - Each image was used to generate 8 degraded input frames - Multi-scale training: 128, 256, 512, and 1024 pixel resolutions **Degradation Pipeline:** - Random spatial shifts (simulating camera shake) - Motion blur with varying kernel sizes and directions - Gaussian and Poisson noise with random intensity **Training Configuration:** - Total epochs: 150 (progressive training) - Optimizer: AdamW - Loss: L1 + Perceptual (VGG) + SSIM + Color Correction ## Limitations - Requires multiple frames of the same scene - Performance depends on frame quality variation - GPU recommended for real-time processing ## Citation ```bibtex @software{karaarslan2026mfir, author = {Karaarslan, Veli}, title = {MFIR: Multi-Frame Image Restoration}, year = {2026}, url = {https://github.com/allcodernet/MFIR} } ``` ## License MIT License - see [LICENSE](https://github.com/allcodernet/MFIR/blob/main/LICENSE) ## Author **Veli Karaarslan** - 2026 ## Links - [GitHub Repository](https://github.com/allcodernet/MFIR) - [Hugging Face Space](https://huggingface.co/spaces/marduk-ra/MFIR)