DiTFuse / README.md

lijiayangCS

Update README.md

fce701f verified 2 months ago

preview code

raw

history blame contribute delete

1.58 kB

metadata

language:
  - en
tags:
  - image-fusion
  - infrared-visible-fusion
  - multi-focus-fusion
  - multi-exposure-fusion
  - diffusion
  - transformer
  - multimodal
  - text-guided
pipeline_tag: image-to-image
library_name: transformers
license: mit
datasets:
  - custom
model_name: DiTFuse

DiTFuse: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach (Official Weights)

This repository provides the official pretrained weights for DiTFuse. The project code is available on GitHub:

👉 GitHub: https://github.com/Henry-Lee-real/DiTFuse

DiTFuse supports multiple fusion tasks—including infrared–visible fusion, multi-focus fusion, multi-exposure fusion, and instruction-driven controllable fusion / segmentation—all within a single unified model.

📌 Available Model Versions

🔹 V1 — Stronger Zero-Shot Generalization

Designed with better zero-shot fusion capability.
Performs robustly on unseen fusion scenarios.
Recommended if your use case emphasizes cross-dataset generalization.

🔹 V2 — Full Capability Version (Paper Model)

This is the main model used in the DiTFuse paper.
Provides the most comprehensive capabilities:
- Full instruction-following control
- Joint fusion + segmentation
- Better fidelity and controllability
- Stronger alignment with text prompts
Recommended for research reproduction, benchmarking, and controllable image fusion tasks.