Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,49 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-sa-4.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
---
|
| 4 |
+
# LISAt_PRE
|
| 5 |
+
|
| 6 |
+
**LISAt_PRE** is a remote-sensing-focused MLLM that is tailored to improve performance in scenarios requiring detailed visual understanding and natural language reasoning over satellite and aerial imagery.
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Overview
|
| 11 |
+
|
| 12 |
+
LISAt_PRE enhances the [LISAt](https://huggingface.co/jquenum/LISAt-7b) framework by adapting it to remote-sensing applications, which require better handling of diverse visual data and specialized query types. The architecture integrates:
|
| 13 |
+
|
| 14 |
+
- A **Remote-CLIP ViT-L/14** vision encoder
|
| 15 |
+
- A **Vicuna-7B** LLM for text understanding and reasoning
|
| 16 |
+
- A **linear projection module** to align vision and language representations
|
| 17 |
+
- A segmentation model trained on high-quality mask annotations
|
| 18 |
+
|
| 19 |
+
An architectural overview is shown in Figure 3 (refer to paper).
|
| 20 |
+
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
## Key Features
|
| 24 |
+
|
| 25 |
+
- **Remote-Sensing Specialization**: Trained on domain-specific imagery to handle the unique challenges of satellite data.
|
| 26 |
+
- **Multimodal Alignment**: Combines textual and visual inputs through a unified architecture.
|
| 27 |
+
- **Training with [PreGRES](https://huggingface.co/datasets/jquenum/PreGRES/blob/main/README.md)**: LISAt_PRE is pre-trained on the [PreGRES](https://huggingface.co/datasets/jquenum/PreGRES/blob/main/README.md) dataset using LoRA (Hu et al., 2021), before being fine-tuned on GRES.
|
| 28 |
+
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
## Architecture
|
| 32 |
+
|
| 33 |
+
- **Language Model**: [Vicuna-7B](https://lmsys.org/blog/2023-03-30-vicuna/) (Chiang et al., 2023)
|
| 34 |
+
- **Vision Encoder**: Remote-CLIP ViT-L/14 (Liu et al., 2024a)
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
## Citation
|
| 39 |
+
|
| 40 |
+
If you use LISAt_PRE in your work, please cite:
|
| 41 |
+
|
| 42 |
+
```bibtex
|
| 43 |
+
@article{quenum2025lisat,
|
| 44 |
+
title={LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery},
|
| 45 |
+
author={Quenum, Jerome and Hsieh, Wen-Han and Wu, Tsung-Han and Gupta, Ritwik and Darrell, Trevor and Chan, David M},
|
| 46 |
+
journal={arXiv preprint arXiv:2505.02829},
|
| 47 |
+
year={2025},
|
| 48 |
+
url={https://arxiv.org/pdf/2505.02829}
|
| 49 |
+
}
|