pipeline_tag: image-segmentation
datasets:
- ronniejiangC/MM-RIS
arxiv: 2509.1271
tags:
- referring-image-segmentation
- image-fusion
- multimodal
RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion from the Perspective of Referring Image Segmentation
This repository contains the model weights for RIS-FUSION, a cascaded framework presented in the paper RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion from the Perspective of Referring Image Segmentation.
RIS-FUSION unifies text-driven infrared and visible image fusion with referring image segmentation (RIS) through joint optimization. The framework addresses the lack of goal-aligned supervision in existing methods by observing that RIS and text-driven fusion share a common objective: highlighting the object referred to by the text. At its core is the LangGatedFusion module, which injects textual features into the fusion backbone to enhance semantic alignment.
Resources
- Paper: arXiv:2509.12710
- GitHub Repository: SijuMa2003/RIS-FUSION
- Dataset (MM-RIS): MM-RIS on Hugging Face
Sample Usage
To evaluate the model using the official implementation, you can use the following command provided in the GitHub repository:
python test.py \
--ckpt ./ckpts/risfusion/model_best_lavt.pth \
--test_parquet ./data/mm_ris_test.parquet \
--out_dir ./your_output_dir \
--bert_tokenizer ./bert/pretrained_weights/bert-base-uncased \
--ck_bert ./bert/pretrained_weights/bert-base-uncased
For detailed installation and training instructions, please refer to the official GitHub repository.
Citation
If you find this work useful, please consider citing the paper:
@article{RIS-FUSION2025,
title = {RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion from the Perspective of Referring Image Segmentation},
author = {Ma, Siju and Gong, Changsiyu and Fan, Xiaofeng and Ma, Yong and Jiang, Chengjie},
journal = {arXiv preprint arXiv:2509.12710},
year = {2025}
}