|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- image-fusion |
|
|
- infrared-visible-fusion |
|
|
- multi-focus-fusion |
|
|
- multi-exposure-fusion |
|
|
- diffusion |
|
|
- transformer |
|
|
- multimodal |
|
|
- text-guided |
|
|
pipeline_tag: image-to-image |
|
|
library_name: transformers |
|
|
license: mit |
|
|
datasets: |
|
|
- custom |
|
|
model_name: DiTFuse |
|
|
--- |
|
|
# **DiTFuse: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach (Official Weights)** |
|
|
|
|
|
This repository provides the **official pretrained weights** for **DiTFuse**. |
|
|
The project code is available on GitHub: |
|
|
|
|
|
👉 **GitHub:** [https://github.com/Henry-Lee-real/DiTFuse](https://github.com/Henry-Lee-real/DiTFuse) |
|
|
|
|
|
DiTFuse supports multiple fusion tasks—including **infrared–visible fusion, multi-focus fusion, multi-exposure fusion**, and **instruction-driven controllable fusion / segmentation**—all within a single unified model. |
|
|
|
|
|
--- |
|
|
|
|
|
## **📌 Available Model Versions** |
|
|
|
|
|
### **🔹 V1 — Stronger Zero-Shot Generalization** |
|
|
|
|
|
* Designed with better **zero-shot fusion capability**. |
|
|
* Performs robustly on unseen fusion scenarios. |
|
|
* Recommended if your use case emphasizes **cross-dataset generalization**. |
|
|
|
|
|
### **🔹 V2 — Full Capability Version (Paper Model)** |
|
|
|
|
|
* This is the **main model used in the DiTFuse paper**. |
|
|
* Provides the **most comprehensive capabilities**: |
|
|
|
|
|
* Full instruction-following control |
|
|
* Joint fusion + segmentation |
|
|
* Better fidelity and controllability |
|
|
* Stronger alignment with text prompts |
|
|
* Recommended for **research reproduction**, **benchmarking**, and **controllable image fusion tasks**. |
|
|
|
|
|
|
|
|
|