---
language:
  - en
tags:
  - image-fusion
  - infrared-visible-fusion
  - multi-focus-fusion
  - multi-exposure-fusion
  - diffusion
  - transformer
  - multimodal
  - text-guided
pipeline_tag: image-to-image
library_name: transformers
license: mit   
datasets:
  - custom
model_name: DiTFuse
---
# **DiTFuse: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach (Official Weights)**

This repository provides the **official pretrained weights** for **DiTFuse**.
The project code is available on GitHub:

👉 **GitHub:** [https://github.com/Henry-Lee-real/DiTFuse](https://github.com/Henry-Lee-real/DiTFuse)

DiTFuse supports multiple fusion tasks—including **infrared–visible fusion, multi-focus fusion, multi-exposure fusion**, and **instruction-driven controllable fusion / segmentation**—all within a single unified model.

---

## **📌 Available Model Versions**

### **🔹 V1 — Stronger Zero-Shot Generalization**

* Designed with better **zero-shot fusion capability**.
* Performs robustly on unseen fusion scenarios.
* Recommended if your use case emphasizes **cross-dataset generalization**.

### **🔹 V2 — Full Capability Version (Paper Model)**

* This is the **main model used in the DiTFuse paper**.
* Provides the **most comprehensive capabilities**:

  * Full instruction-following control
  * Joint fusion + segmentation
  * Better fidelity and controllability
  * Stronger alignment with text prompts
* Recommended for **research reproduction**, **benchmarking**, and **controllable image fusion tasks**.