|
|
--- |
|
|
license: cc-by-nc-sa-4.0 |
|
|
--- |
|
|
# LISAt_PRE |
|
|
|
|
|
**LISAt_PRE** is a remote-sensing-focused MLLM that is tailored to improve performance in scenarios requiring detailed visual understanding and natural language reasoning over satellite and aerial imagery. |
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
LISAt_PRE enhances the [LISAt](https://huggingface.co/jquenum/LISAt-7b) framework by adapting it to remote-sensing applications, which require better handling of diverse visual data and specialized query types. The architecture integrates: |
|
|
|
|
|
- A **Remote-CLIP ViT-L/14** vision encoder |
|
|
- A **Vicuna-7B** LLM for text understanding and reasoning |
|
|
- A **linear projection module** to align vision and language representations |
|
|
- A segmentation model trained on high-quality mask annotations |
|
|
|
|
|
An architectural overview is shown in Figure 3 (refer to paper). |
|
|
|
|
|
--- |
|
|
|
|
|
## Key Features |
|
|
|
|
|
- **Remote-Sensing Specialization**: Trained on domain-specific imagery to handle the unique challenges of satellite data. |
|
|
- **Multimodal Alignment**: Combines textual and visual inputs through a unified architecture. |
|
|
- **Training with [PreGRES](https://huggingface.co/datasets/jquenum/PreGRES/blob/main/README.md)**: LISAt_PRE is pre-trained on the [PreGRES](https://huggingface.co/datasets/jquenum/PreGRES/blob/main/README.md) dataset using LoRA (Hu et al., 2021), before being fine-tuned on GRES. |
|
|
|
|
|
--- |
|
|
|
|
|
## Architecture |
|
|
|
|
|
- **Language Model**: Vicuna-7B (Chiang et al., 2023) |
|
|
- **Vision Encoder**: Remote-CLIP ViT-L/14 (Liu et al., 2024a) |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use LISAt_PRE in your work, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{quenum2025lisat, |
|
|
title={LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery}, |
|
|
author={Quenum, Jerome and Hsieh, Wen-Han and Wu, Tsung-Han and Gupta, Ritwik and Darrell, Trevor and Chan, David M}, |
|
|
journal={arXiv preprint arXiv:2505.02829}, |
|
|
year={2025}, |
|
|
url={https://arxiv.org/pdf/2505.02829} |
|
|
} |
|
|
|