lijiayangCS
/

DiTFuse

infrared-visible-fusion

multi-focus-fusion

multi-exposure-fusion

Model card Files Files and versions

DiTFuse / README.md

lijiayangCS's picture

Update README.md

fce701f verified 2 months ago

|

history blame contribute delete

1.58 kB

	---
	language:
	- en
	tags:
	- image-fusion
	- infrared-visible-fusion
	- multi-focus-fusion
	- multi-exposure-fusion
	- diffusion
	- transformer
	- multimodal
	- text-guided
	pipeline_tag: image-to-image
	library_name: transformers
	license: mit
	datasets:
	- custom
	model_name: DiTFuse
	---
	# DiTFuse: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach (Official Weights)

	This repository provides the official pretrained weights for DiTFuse.
	The project code is available on GitHub:

	👉 GitHub: [https://github.com/Henry-Lee-real/DiTFuse](https://github.com/Henry-Lee-real/DiTFuse)

	DiTFuse supports multiple fusion tasks—including infrared–visible fusion, multi-focus fusion, multi-exposure fusion, and instruction-driven controllable fusion / segmentation—all within a single unified model.

	---

	## 📌 Available Model Versions

	### 🔹 V1 — Stronger Zero-Shot Generalization

	* Designed with better zero-shot fusion capability.
	* Performs robustly on unseen fusion scenarios.
	* Recommended if your use case emphasizes cross-dataset generalization.

	### 🔹 V2 — Full Capability Version (Paper Model)

	* This is the main model used in the DiTFuse paper.
	* Provides the most comprehensive capabilities:

	* Full instruction-following control
	* Joint fusion + segmentation
	* Better fidelity and controllability
	* Stronger alignment with text prompts
	* Recommended for research reproduction, benchmarking, and controllable image fusion tasks.