RIS-FUSION / README.md
nielsr's picture
nielsr HF Staff
Add model card for RIS-FUSION and metadata
e795edd verified
|
raw
history blame
2.3 kB
metadata
pipeline_tag: image-segmentation
datasets:
  - ronniejiangC/MM-RIS
arxiv: 2509.1271
tags:
  - referring-image-segmentation
  - image-fusion
  - multimodal

RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion from the Perspective of Referring Image Segmentation

This repository contains the model weights for RIS-FUSION, a cascaded framework presented in the paper RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion from the Perspective of Referring Image Segmentation.

RIS-FUSION unifies text-driven infrared and visible image fusion with referring image segmentation (RIS) through joint optimization. The framework addresses the lack of goal-aligned supervision in existing methods by observing that RIS and text-driven fusion share a common objective: highlighting the object referred to by the text. At its core is the LangGatedFusion module, which injects textual features into the fusion backbone to enhance semantic alignment.

Resources

Sample Usage

To evaluate the model using the official implementation, you can use the following command provided in the GitHub repository:

python test.py \
  --ckpt ./ckpts/risfusion/model_best_lavt.pth \
  --test_parquet ./data/mm_ris_test.parquet \
  --out_dir ./your_output_dir \
  --bert_tokenizer ./bert/pretrained_weights/bert-base-uncased \
  --ck_bert ./bert/pretrained_weights/bert-base-uncased

For detailed installation and training instructions, please refer to the official GitHub repository.

Citation

If you find this work useful, please consider citing the paper:

@article{RIS-FUSION2025,
  title   = {RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion from the Perspective of Referring Image Segmentation},
  author  = {Ma, Siju and Gong, Changsiyu and Fan, Xiaofeng and Ma, Yong and Jiang, Chengjie},
  journal = {arXiv preprint arXiv:2509.12710},
  year    = {2025}
}