jquenum commited on
Commit
0671e58
·
verified ·
1 Parent(s): fb40849

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -16,9 +16,11 @@ With its advanced reasoning capabilities and superior performance on geospatial
16
 
17
  ## Model Details
18
 
19
- - **Model architecture**: Inspired by LISA ![Lai et al., 2024](https://arxiv.org/pdf/2308.00692), LISAT integrates a multimodal large language model (LLM) with a segmentation model. Its architechture is shown below.
20
 
21
- ![LISAT Model Architecture](https://huggingface.co/jquenum/LISAt-7b/resolve/main/LISAt.png)
 
 
22
 
23
  - **Training data**: we introduce the Geospatial Reasoning Segmentation Dataset (GRES), a collection of vision and language data designed around
24
  remote-sensing applications. GRES consists of two core components: PreGRES, a dataset consisting of over 1M remote-sensing specific visual instruction-tuning Q/A pairs for pre-training geospatial models, and GRES, a semi-synthetic dataset specialized for reasoning segmentation of remote-sensing data and consisting of 9,205 images and 27,615 natural language queries/answers within those images. From this LISAt dataset, we generate train, test, and validation splits consisting of 7,205, 1,500, and 500 images respectively.
 
16
 
17
  ## Model Details
18
 
19
+ - **Model architecture**: Inspired by LISA (Lai et al., 2024), LISAT integrates a multimodal large language model (LLM) with a segmentation model. Its architechture is shown below.
20
 
21
+ <!-- ![LISAT Model Architecture](https://huggingface.co/jquenum/LISAt-7b/resolve/main/LISAt.png) -->
22
+
23
+ <img src="https://huggingface.co/jquenum/LISAt-7b/resolve/main/LISAt.png" width="600"/>
24
 
25
  - **Training data**: we introduce the Geospatial Reasoning Segmentation Dataset (GRES), a collection of vision and language data designed around
26
  remote-sensing applications. GRES consists of two core components: PreGRES, a dataset consisting of over 1M remote-sensing specific visual instruction-tuning Q/A pairs for pre-training geospatial models, and GRES, a semi-synthetic dataset specialized for reasoning segmentation of remote-sensing data and consisting of 9,205 images and 27,615 natural language queries/answers within those images. From this LISAt dataset, we generate train, test, and validation splits consisting of 7,205, 1,500, and 500 images respectively.