--- language: - en tags: - image-fusion - infrared-visible-fusion - multi-focus-fusion - multi-exposure-fusion - diffusion - transformer - multimodal - text-guided pipeline_tag: image-to-image library_name: transformers license: mit datasets: - custom model_name: DiTFuse --- # **DiTFuse: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach (Official Weights)** This repository provides the **official pretrained weights** for **DiTFuse**. The project code is available on GitHub: 👉 **GitHub:** [https://github.com/Henry-Lee-real/DiTFuse](https://github.com/Henry-Lee-real/DiTFuse) DiTFuse supports multiple fusion tasks—including **infrared–visible fusion, multi-focus fusion, multi-exposure fusion**, and **instruction-driven controllable fusion / segmentation**—all within a single unified model. --- ## **📌 Available Model Versions** ### **🔹 V1 — Stronger Zero-Shot Generalization** * Designed with better **zero-shot fusion capability**. * Performs robustly on unseen fusion scenarios. * Recommended if your use case emphasizes **cross-dataset generalization**. ### **🔹 V2 — Full Capability Version (Paper Model)** * This is the **main model used in the DiTFuse paper**. * Provides the **most comprehensive capabilities**: * Full instruction-following control * Joint fusion + segmentation * Better fidelity and controllability * Stronger alignment with text prompts * Recommended for **research reproduction**, **benchmarking**, and **controllable image fusion tasks**.