File size: 1,578 Bytes
fce701f 92a74a3 0f28d2d 92a74a3 0f28d2d 92a74a3 0f28d2d 92a74a3 0f28d2d 92a74a3 0f28d2d 92a74a3 0f28d2d 92a74a3 0f28d2d 92a74a3 0f28d2d 92a74a3 0f28d2d 92a74a3 0f28d2d 92a74a3 0f28d2d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | ---
language:
- en
tags:
- image-fusion
- infrared-visible-fusion
- multi-focus-fusion
- multi-exposure-fusion
- diffusion
- transformer
- multimodal
- text-guided
pipeline_tag: image-to-image
library_name: transformers
license: mit
datasets:
- custom
model_name: DiTFuse
---
# **DiTFuse: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach (Official Weights)**
This repository provides the **official pretrained weights** for **DiTFuse**.
The project code is available on GitHub:
👉 **GitHub:** [https://github.com/Henry-Lee-real/DiTFuse](https://github.com/Henry-Lee-real/DiTFuse)
DiTFuse supports multiple fusion tasks—including **infrared–visible fusion, multi-focus fusion, multi-exposure fusion**, and **instruction-driven controllable fusion / segmentation**—all within a single unified model.
---
## **📌 Available Model Versions**
### **🔹 V1 — Stronger Zero-Shot Generalization**
* Designed with better **zero-shot fusion capability**.
* Performs robustly on unseen fusion scenarios.
* Recommended if your use case emphasizes **cross-dataset generalization**.
### **🔹 V2 — Full Capability Version (Paper Model)**
* This is the **main model used in the DiTFuse paper**.
* Provides the **most comprehensive capabilities**:
* Full instruction-following control
* Joint fusion + segmentation
* Better fidelity and controllability
* Stronger alignment with text prompts
* Recommended for **research reproduction**, **benchmarking**, and **controllable image fusion tasks**.
|