DiTFuse / README.md
lijiayangCS's picture
Update README.md
fce701f verified
metadata
language:
  - en
tags:
  - image-fusion
  - infrared-visible-fusion
  - multi-focus-fusion
  - multi-exposure-fusion
  - diffusion
  - transformer
  - multimodal
  - text-guided
pipeline_tag: image-to-image
library_name: transformers
license: mit
datasets:
  - custom
model_name: DiTFuse

DiTFuse: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach (Official Weights)

This repository provides the official pretrained weights for DiTFuse. The project code is available on GitHub:

👉 GitHub: https://github.com/Henry-Lee-real/DiTFuse

DiTFuse supports multiple fusion tasks—including infrared–visible fusion, multi-focus fusion, multi-exposure fusion, and instruction-driven controllable fusion / segmentation—all within a single unified model.


📌 Available Model Versions

🔹 V1 — Stronger Zero-Shot Generalization

  • Designed with better zero-shot fusion capability.
  • Performs robustly on unseen fusion scenarios.
  • Recommended if your use case emphasizes cross-dataset generalization.

🔹 V2 — Full Capability Version (Paper Model)

  • This is the main model used in the DiTFuse paper.

  • Provides the most comprehensive capabilities:

    • Full instruction-following control
    • Joint fusion + segmentation
    • Better fidelity and controllability
    • Stronger alignment with text prompts
  • Recommended for research reproduction, benchmarking, and controllable image fusion tasks.