jquenum
/

LISAt_PRE-7b

Model card Files Files and versions

jquenum commited on May 16, 2025

Commit

6173a15

·

verified ·

1 Parent(s): 2832e94

Update README.md

Files changed (1) hide show

README.md +49 -3

README.md CHANGED Viewed

@@ -1,3 +1,49 @@
----
-license: cc-by-nc-sa-4.0
----

+---
+license: cc-by-nc-sa-4.0
+---
+# LISAt_PRE
+**LISAt_PRE** is a remote-sensing-focused MLLM that is tailored to improve performance in scenarios requiring detailed visual understanding and natural language reasoning over satellite and aerial imagery.
+---
+## Overview
+LISAt_PRE enhances the [LISAt](https://huggingface.co/jquenum/LISAt-7b) framework by adapting it to remote-sensing applications, which require better handling of diverse visual data and specialized query types. The architecture integrates:
+- A **Remote-CLIP ViT-L/14** vision encoder
+- A **Vicuna-7B** LLM for text understanding and reasoning
+- A **linear projection module** to align vision and language representations
+- A segmentation model trained on high-quality mask annotations
+An architectural overview is shown in Figure 3 (refer to paper).
+---
+## Key Features
+- **Remote-Sensing Specialization**: Trained on domain-specific imagery to handle the unique challenges of satellite data.
+- **Multimodal Alignment**: Combines textual and visual inputs through a unified architecture.
+- **Training with [PreGRES](https://huggingface.co/datasets/jquenum/PreGRES/blob/main/README.md)**: LISAt_PRE is pre-trained on the [PreGRES](https://huggingface.co/datasets/jquenum/PreGRES/blob/main/README.md) dataset using LoRA (Hu et al., 2021), before being fine-tuned on GRES.
+---
+## Architecture
+- **Language Model**: [Vicuna-7B](https://lmsys.org/blog/2023-03-30-vicuna/) (Chiang et al., 2023)
+- **Vision Encoder**: Remote-CLIP ViT-L/14 (Liu et al., 2024a)
+---
+## Citation
+If you use LISAt_PRE in your work, please cite:
+```bibtex
+@article{quenum2025lisat,
+  title={LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery},
+  author={Quenum, Jerome and Hsieh, Wen-Han and Wu, Tsung-Han and Gupta, Ritwik and Darrell, Trevor and Chan, David M},
+  journal={arXiv preprint arXiv:2505.02829},
+  year={2025},
+  url={https://arxiv.org/pdf/2505.02829}
+}