zhangtao
commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,37 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# UNIP
|
| 6 |
+
|
| 7 |
+
This repository contains the official pre-trained checkpoints of the paper "[UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation](https://arxiv.org/abs/2502.02257)".
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
## 📖 Introduction
|
| 11 |
+
|
| 12 |
+

|
| 13 |
+
|
| 14 |
+
In this work, we first benchmark the infrared semantic segmentation performance of various pre-training methods and reveal several phenomena distinct from the RGB domain. Next, our layerwise analysis of pre-trained attention maps uncovers that: (1) There are three typical attention patterns (local, hybrid, and global); (2) Pre-training tasks notably influence the pattern distribution across layers; (3) The hybrid pattern is crucial for semantic segmentation as it attends to both nearby and foreground elements; (4) The texture bias impedes model generalization in infrared tasks. Building on these insights, we propose **UNIP,** a **UN**ified **I**nfrared **P**re-training framework, to enhance the pre-trained model performance. This framework uses the hybrid-attention distillation NMI-HAD as the pre-training target, a large-scale mixed dataset InfMix for pre-training, and a last-layer feature pyramid network LL-FPN for fine-tuning.
|
| 15 |
+
|
| 16 |
+

|
| 17 |
+
|
| 18 |
+
Experimental results show that UNIP outperforms various pre-training methods by up to **13.5%** in average mIoU on three infrared segmentation tasks, evaluated using fine-tuning and linear probing metrics. UNIP-S achieves performance on par with MAE-L while requiring only **1/10** of the computational cost. Furthermore, UNIP significantly surpasses state-of-the-art (SOTA) infrared or RGB segmentation methods and demonstrates the broad potential for application in other modalities, such as RGB and depth.
|
| 19 |
+
|
| 20 |
+
<img src="imgs/benchmark.png" alt="benchmark" style="zoom: 67%;" />
|
| 21 |
+
|
| 22 |
+
## 🛠️ Usage
|
| 23 |
+
Please refer to the [GitHub repository](https://github.com/casiatao/UNIP).
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
## Citation
|
| 27 |
+
If you find this repository helpful, please consider citing:
|
| 28 |
+
```bibtex
|
| 29 |
+
@inproceedings{
|
| 30 |
+
zhang2025unip,
|
| 31 |
+
title={{UNIP}: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation},
|
| 32 |
+
author={Tao Zhang and Jinyong Wen and Zhen Chen and Kun Ding and Shiming Xiang and Chunhong Pan},
|
| 33 |
+
booktitle={The Thirteenth International Conference on Learning Representations},
|
| 34 |
+
year={2025},
|
| 35 |
+
url={https://openreview.net/forum?id=Xq7gwsnhPT}
|
| 36 |
+
}
|
| 37 |
+
```
|