jquenum
/

LISAt_PRE-7b

Model card Files Files and versions

LISAt_PRE-7b / README.md

jquenum's picture

Update README.md

9dad9a7 verified 8 months ago

|

history blame contribute delete

1.9 kB

	---
	license: cc-by-nc-sa-4.0
	---
	# LISAt_PRE

	LISAt_PRE is a remote-sensing-focused MLLM that is tailored to improve performance in scenarios requiring detailed visual understanding and natural language reasoning over satellite and aerial imagery.

	---

	## Overview

	LISAt_PRE enhances the [LISAt](https://huggingface.co/jquenum/LISAt-7b) framework by adapting it to remote-sensing applications, which require better handling of diverse visual data and specialized query types. The architecture integrates:

	- A Remote-CLIP ViT-L/14 vision encoder
	- A Vicuna-7B LLM for text understanding and reasoning
	- A linear projection module to align vision and language representations
	- A segmentation model trained on high-quality mask annotations

	An architectural overview is shown in Figure 3 (refer to paper).

	---

	## Key Features

	- Remote-Sensing Specialization: Trained on domain-specific imagery to handle the unique challenges of satellite data.
	- Multimodal Alignment: Combines textual and visual inputs through a unified architecture.
	- Training with [PreGRES](https://huggingface.co/datasets/jquenum/PreGRES/blob/main/README.md): LISAt_PRE is pre-trained on the [PreGRES](https://huggingface.co/datasets/jquenum/PreGRES/blob/main/README.md) dataset using LoRA (Hu et al., 2021), before being fine-tuned on GRES.

	---

	## Architecture

	- Language Model: Vicuna-7B (Chiang et al., 2023)
	- Vision Encoder: Remote-CLIP ViT-L/14 (Liu et al., 2024a)

	---

	## Citation

	If you use LISAt_PRE in your work, please cite:

	```bibtex
	@article{quenum2025lisat,
	title={LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery},
	author={Quenum, Jerome and Hsieh, Wen-Han and Wu, Tsung-Han and Gupta, Ritwik and Darrell, Trevor and Chan, David M},
	journal={arXiv preprint arXiv:2505.02829},
	year={2025},
	url={https://arxiv.org/pdf/2505.02829}
	}