DiTFuse / README.md
lijiayangCS's picture
Update README.md
fce701f verified
---
language:
- en
tags:
- image-fusion
- infrared-visible-fusion
- multi-focus-fusion
- multi-exposure-fusion
- diffusion
- transformer
- multimodal
- text-guided
pipeline_tag: image-to-image
library_name: transformers
license: mit
datasets:
- custom
model_name: DiTFuse
---
# **DiTFuse: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach (Official Weights)**
This repository provides the **official pretrained weights** for **DiTFuse**.
The project code is available on GitHub:
👉 **GitHub:** [https://github.com/Henry-Lee-real/DiTFuse](https://github.com/Henry-Lee-real/DiTFuse)
DiTFuse supports multiple fusion tasks—including **infrared–visible fusion, multi-focus fusion, multi-exposure fusion**, and **instruction-driven controllable fusion / segmentation**—all within a single unified model.
---
## **📌 Available Model Versions**
### **🔹 V1 — Stronger Zero-Shot Generalization**
* Designed with better **zero-shot fusion capability**.
* Performs robustly on unseen fusion scenarios.
* Recommended if your use case emphasizes **cross-dataset generalization**.
### **🔹 V2 — Full Capability Version (Paper Model)**
* This is the **main model used in the DiTFuse paper**.
* Provides the **most comprehensive capabilities**:
* Full instruction-following control
* Joint fusion + segmentation
* Better fidelity and controllability
* Stronger alignment with text prompts
* Recommended for **research reproduction**, **benchmarking**, and **controllable image fusion tasks**.