ronniejiangC
/

RIS-FUSION

+---
+pipeline_tag: image-segmentation
+datasets:
+- ronniejiangC/MM-RIS
+arxiv: 2509.12710
+tags:
+- referring-image-segmentation
+- image-fusion
+- multimodal
+---
+# RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion from the Perspective of Referring Image Segmentation
+This repository contains the model weights for **RIS-FUSION**, a cascaded framework presented in the paper [RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion from the Perspective of Referring Image Segmentation](https://huggingface.co/papers/2509.12710).
+RIS-FUSION unifies text-driven infrared and visible image fusion with referring image segmentation (RIS) through joint optimization. The framework addresses the lack of goal-aligned supervision in existing methods by observing that RIS and text-driven fusion share a common objective: highlighting the object referred to by the text. At its core is the *LangGatedFusion* module, which injects textual features into the fusion backbone to enhance semantic alignment.
+## Resources
+-   **Paper**: [arXiv:2509.12710](https://huggingface.co/papers/2509.12710)
+-   **GitHub Repository**: [SijuMa2003/RIS-FUSION](https://github.com/SijuMa2003/RIS-FUSION)
+-   **Dataset (MM-RIS)**: [MM-RIS on Hugging Face](https://huggingface.co/datasets/ronniejiangC/MM-RIS)
+## Sample Usage
+To evaluate the model using the official implementation, you can use the following command provided in the GitHub repository:
+```bash
+python test.py \
+  --ckpt ./ckpts/risfusion/model_best_lavt.pth \
+  --test_parquet ./data/mm_ris_test.parquet \
+  --out_dir ./your_output_dir \
+  --bert_tokenizer ./bert/pretrained_weights/bert-base-uncased \
+  --ck_bert ./bert/pretrained_weights/bert-base-uncased
+```
+For detailed installation and training instructions, please refer to the [official GitHub repository](https://github.com/SijuMa2003/RIS-FUSION).
+## Citation
+If you find this work useful, please consider citing the paper:
+```bibtex
+@article{RIS-FUSION2025,
+  title   = {RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion from the Perspective of Referring Image Segmentation},
+  author  = {Ma, Siju and Gong, Changsiyu and Fan, Xiaofeng and Ma, Yong and Jiang, Chengjie},
+  journal = {arXiv preprint arXiv:2509.12710},
+  year    = {2025}
+}
+```