jaychempan commited on
Commit
685840f
Β·
verified Β·
1 Parent(s): b547d56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +403 -3
README.md CHANGED
@@ -1,3 +1,403 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - remote-sensing
5
+ - computer-vision
6
+ - open-vocabulary
7
+ - benchmark
8
+ - image-dataset
9
+ ---
10
+ <p align="center">
11
+ <img src="assets/lae-dino.png" alt="Image" width="70">
12
+ </p>
13
+ <div align="center">
14
+ <h1 align="center">Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community</h1>
15
+
16
+ <h4 align="center"><em>Jiancheng Pan*, Β  Β  Yanxing Liu*, Β  Β  Yuqian Fuβœ‰, Β  Β  Muyuan Ma,</em></h4>
17
+
18
+ <h4 align="center"><em>Jiahao Li, Β  Β  Danda Pani Paudel, Β  Β Luc Van Gool, Β  Β  Xiaomeng Huangβœ‰ </em></h4>
19
+ <p align="center">
20
+ <img src="assets/inst.png" alt="Image" width="500">
21
+ </p>
22
+
23
+ \* *Equal Contribution* &nbsp; &nbsp; Corresponding Author βœ‰
24
+
25
+ </div>
26
+
27
+ <p align="center">
28
+ <a href="http://arxiv.org/abs/2408.09110"><img src="https://img.shields.io/badge/Arxiv-2408.09110-b31b1b.svg?logo=arXiv"></a>
29
+ <a href="https://ojs.aaai.org/index.php/AAAI/article/view/32672"><img src="https://img.shields.io/badge/AAAI'25-Paper-blue"></a>
30
+ <a href="https://jianchengpan.space/LAE-website/index.html"><img src="https://img.shields.io/badge/LAE-Project_Page-<color>"></a>
31
+ <a href="https://huggingface.co/datasets/jaychempan/LAE-1M"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-HuggingFace-yellow?style=flat&logo=hug"></a>
32
+ <a href="https://github.com/jaychempan/LAE-DINO/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow"></a>
33
+ </p>
34
+
35
+ <p align="center">
36
+ <a href="#news">News</a> |
37
+ <a href="#abstract">Abstract</a> |
38
+ <a href="#engine">Engine</a> |
39
+ <a href="#dataset">Dataset</a> |
40
+ <a href="#model">Model</a> |
41
+ <a href="#statement">Statement</a>
42
+ </p>
43
+
44
+ <!-- ## TODO
45
+
46
+ - [X] Release LAE-Label Engine
47
+ - [X] Release LAE-1M Dataset
48
+ - [ ] Release LAE-DINO Model -->
49
+
50
+ ## News
51
+ - [2025/8/21] All our LAE-1M dataset support is available for download at πŸ€— [HuggingFace](https://huggingface.co/datasets/jaychempan/LAE-1M).
52
+ - [2025/4/19] We add the inference examples and the original annotation, and the processed annotation file of DIOR and DOTAv2.
53
+ - [2025/3/19] We add the LAE-DINO's config fine-tuned on DIOR and DOTAv2.
54
+ - [2025/2/28] We have open sourced the <a href="#model">LAE-DINO Model </a>.
55
+ - [2025/2/5] We have open sourced the <a href="#dataset">LAE-1M Dataset </a>.
56
+ - [2025/2/5] The LAE-80C dataset, containing 80 classes, has been released as a new remote sensing OVD benchmark and can be quickly [downloaded](https://drive.google.com/drive/folders/1HPu97-f1SNF2sWm3Cdb2FHLRybdRbCtS?usp=sharing) here.
57
+ - [2025/1/17] We have open sourced the code for <a href="#engine">LAE-Label Engine </a>.
58
+ - [2024/12/10] Our paper of "Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community" is accepted [AAAI&#39;25](https://aaai.org/conference/aaai/aaai-25/), we will open source as soon as possible!
59
+ - [2024/8/17] Our paper of "Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community" is up on [arXiv](http://arxiv.org/abs/2408.09110).
60
+
61
+ ## Abstract
62
+
63
+ Object detection, particularly open-vocabulary object detection, plays a crucial role in Earth sciences, such as environmental monitoring, natural disaster assessment, and land-use planning. However, existing open-vocabulary detectors, primarily trained on natural-world images, struggle to generalize to remote sensing images due to a significant data domain gap. Thus, this paper aims to advance the development of open-vocabulary object detection in remote sensing community. To achieve this, we first reformulate the task as Locate Anything on Earth (LAE) with the goal of detecting any novel concepts on Earth. We then developed the LAE-Label Engine which collects, auto-annotates, and unifies up to 10 remote sensing datasets creating the LAE-1M - the first large-scale remote sensing object detection dataset with broad category coverage. Using the LAE-1M, we further propose and train the novel LAE-DINO Model, the first open-vocabulary foundation object detector for the LAE task, featuring Dynamic Vocabulary Construction (DVC) and Visual-Guided Text Prompt Learning (VisGT) modules. DVC dynamically constructs vocabulary for each training batch, while VisGT maps visual features to semantic space, enhancing text features. We comprehensively conduct experiments on established remote sensing benchmark DIOR, DOTAv2.0, as well as our newly introduced 80-class LAE-80C benchmark. Results demonstrate the advantages of the LAE-1M dataset and the effectiveness of the LAE-DINO method.
64
+
65
+ <p align="center">
66
+ <img src="assets/lae.png" alt="Image" width="500">
67
+ </p>
68
+
69
+ ## Engine
70
+
71
+ ### LAE-Label Engine Pipeline
72
+
73
+ The pipeline of our LAE-Label Engine. For LAE-FOD dataset, we use coco slice of open-source tools [SAHI](https://github.com/obss/sahi) to automatically slice COCO annotation and image files ([coco-slice-command-usage](https://github.com/obss/sahi/blob/main/docs/cli.md#coco-slice-command-usage)). For LAE-COD dataset, we build it with the following series of commands (<a href="###how-to-use-lae-label">How to use LAE-Label </a>). We uniformly convert to COCO format. You can refer to my environment [lae_requirements.txt](http://vgithub.com/jaychempan/LAE-DINO/blob/main/internvl_requirements.txt) if you encounter problems.
74
+
75
+ <p align="center">
76
+ <img src="assets/LAE-Label.png" alt="Image" width="500">
77
+ </p>
78
+
79
+ ### How to use LAE-Label
80
+ LAE-Label is mainly based on the [SAM](https://github.com/facebookresearch/segment-anything) and [InternVL](https://github.com/OpenGVLab/InternVL/tree/main) projects, mainly referring to the InternVL environment installation.
81
+
82
+ **Note:** transformers==4.42.3 or 4.45.2(InternVL maybe not install higher transformers version), maybe you can ref the `./internvl_requirements.txt` for installing internvl enviroment.
83
+
84
+ (Optional) For high resolution remote sensing images, we crop to `1024x1024` size,
85
+
86
+ ```
87
+ python LAE-Label/crop_huge_images.py --input_folder ./LAE_data/DATASET --output_folder ./LAE_data/DATASET_sub
88
+ ```
89
+
90
+ SAM is then used to obtain the region of interst (RoI) of the image,
91
+
92
+ ```
93
+ python LAE-Label/det_with_sam.py --checkpoint ./models/sam_vit_h_4b8939.pth --model-type 'vit_h' --input path/to/images/ --output ./LAE_data/DATASET_sub/seg/ --points-per-side 32 --pred-iou-thresh 0.86 --stability-score-thresh 0.92 --crop-n-layers 1 --crop-n-points-downscale-factor 2 --min-mask-region-area 10000
94
+ ```
95
+
96
+ Then crop to get the RoI,
97
+
98
+ ```
99
+ python LAE-Label/crop_rois_from_images.py --img_dir ./LAE_data/DATASET_sub/ --base_dir ./LAE_data/DATASET_sub/seg/ --out_dir ./LAE_data/DATASET_sub/crop/ --N 10 --end jpg
100
+ ```
101
+
102
+ The currently used current open source model with the best multimodal macromodel results, InternVL, provides two versions, the front is the basic version, but the weight is too large, the latter model with `16% of the model size, 90% of the performance`.
103
+
104
+ ```
105
+ huggingface-cli download --resume-download OpenGVLab/InternVL-Chat-V1-5 --local-dir InternVL-Chat-V1-5
106
+ huggingface-cli download --resume-download OpenGVLab/Mini-InternVL-Chat-4B-V1-5 --local-dir Mini-InternVL-Chat-4B-V1-5
107
+ ```
108
+
109
+ We also tested InternVL models of different sizes, including InternVL2-8B (16 GB), InternVL-Chat-V1-5 (48 GB), and InternVL2-26B (48 GB).
110
+ <p align="center">
111
+ <img src="assets/LAE-Engine-Test.png" alt="Image" width="800">
112
+ </p>
113
+
114
+ Use the LVLM model to generate the corresponding RoI categories according to the prompt template. It can be deployed locally and remotely, with specific code below. Where remote deployment can be found in [lmdeploy](https://github.com/InternLM/lmdeploy).
115
+
116
+ ```
117
+ # local inference
118
+ python LAE-Label/internvl-infer.py --model_path ./models/InternVL-Chat-V1-5 --root_directory ./LAE_data/DATASET_sub/crop --csv_save_path ./LAE_data/DATASET_sub/csv/
119
+
120
+ # remote inference
121
+
122
+ python LAE-Label/internvl-infer-openai.py --api_key OPENAIAPIKEY --base_url https://oneapi.XXXX.site:8888/v1 --model_name "internvl-internlm2" --input_dir ./LAE_data/DATASET_sub/crop --output_dir ./LAE_data/DATASET_sub/csv/
123
+ ```
124
+
125
+ (Optional) Then convert to [odvg dataset format](https://github.com/longzw1997/Open-GroundingDino/blob/main/data_format.md) for better post-processing and other operations,
126
+
127
+ ```
128
+ python LAE-Label/dets2odvg.py
129
+ ```
130
+
131
+ (Optional) If you want to see the RoI visualization, by querying the image in odvg format. Converted to ODVG for easy visualization and processing.
132
+
133
+ ```
134
+ python LAE-Label/plot_bboxs_odvg_dir.py
135
+ ```
136
+
137
+ (Optional) Further optimise the quality of the labelling and develop some rules, refer [post process method](./LAE-Label/post_process/README.md).
138
+
139
+ Some examples of labelling using LAE-Label, but without rule-based filtering operations.
140
+
141
+ <p align="center">
142
+ <img src="assets/LAE-Label-PIC.png" alt="Image" width="700">
143
+ </p>
144
+
145
+ ## Dataset
146
+
147
+ LAE-1M dataset contains abundance categories composed of coarse-grained LAE-COD and fine-grained LAE-FOD. LAE-1M samples from these datasets by category and does not count instances of overlap duplicates when slicing.
148
+
149
+ <p align="center">
150
+ <img src="assets/LAE-1M.png" alt="Image" width="700">
151
+ </p>
152
+
153
+ ### Dowload LAE-1M Dataset
154
+
155
+ Download data can be downloaded through `Baidu disk` or `Onedrive`, the download address provided below is downloaded to the `./LAE-DINO` of the project.
156
+
157
+ Note: **LAE-Label Engine is continuously optimized, the quality of data annotation is also improved.** We try to explore higher quality data annotations, and dataset versions are iteratively updated. The current version dataset is v1.1, which is the best labelled version available. We also intend to build stable benchmarks based on this version.
158
+
159
+ > Baidu disk: [download link](https://pan.baidu.com/s/1_l2i0gUPcDbTUkNkUqEhjg?pwd=chrx)
160
+
161
+ > Onedrive: [download link](https://1drv.ms/f/c/72d4076f2aa319be/EhpYDEA71mFOorBWIoxglwMBNuy3i3bbf2W1qi8IHBjOAA?e=mGThPR)
162
+
163
+ Once you have downloaded the dataset, you can extract the image files in all subdirectories with a shell command.
164
+
165
+ ```
166
+ bash tools/unzip_all_images_files.sh
167
+ ```
168
+
169
+ We have preserved the image catalog names of the original datasets (e.g. DOTA,DIOR et.al.) as much as possible, so it is possible to incrementally download parts (SLM, EMS) of the image data, and separate labeled files.
170
+
171
+ e.g. Extract `images` from origin datasets:
172
+
173
+ ```
174
+ # DOTAv2 dataset
175
+ cd DOTAv2/
176
+ unzip images.zip
177
+
178
+ # DIOR dataset
179
+ cd DIOR/
180
+ unzip images.zip
181
+
182
+ # FAIR1M dataset
183
+ cd FAIR1M/
184
+ unzip images.zip
185
+
186
+ # NWPU-VHR-10 dataset
187
+ cd NWPU-VHR-10/
188
+ unzip images.zip
189
+
190
+
191
+ # HRSC2016 dataset
192
+ cd HRSC2016/
193
+ unrar x HRSC2016.part01.rar
194
+ mv Train/AllImages ../Train/AllImages
195
+
196
+ # RSOD dataset:
197
+ cd RSOD/
198
+ mkdir ../images
199
+ mv aircraft/JPEGImages ../images
200
+ mv oiltank/JPEGImages ../images
201
+ mv overpass/JPEGImages ../images
202
+ mv aircraft/JPEGImages ../images
203
+ ```
204
+
205
+ ### LAE-80C Benchmark
206
+
207
+ LAE-80C is sampled from the validation set of multiple remote sensing object detection datasets to filter the categories that are as semantically non-overlapping as possible. We combined these categories to create a benchmark with 80 categories.
208
+
209
+ <p align="center">
210
+ <img src="assets/LAE-80C.png" alt="Image" width="700">
211
+ </p>
212
+
213
+ **There is a lack of larger categories of detection benchmarks for the remote sensing community.** The LAE-80C can be used alone as a standard for evaluating 80-class object detection in remote sensing scenarios. Here is a quick [download](https://drive.google.com/drive/folders/1HPu97-f1SNF2sWm3Cdb2FHLRybdRbCtS?usp=sharing) via google drive.
214
+
215
+ ### Dataset Catalogue
216
+
217
+ The directory structure of the `./data` file is shown below. In order to unify the various structures, we can directly use the coco format data. `Power-Plant` is the `Condesing-Towering` of paper.
218
+
219
+ ```
220
+ ./data
221
+ β”œβ”€β”€ LAE-80C
222
+ β”‚ β”œβ”€β”€ images
223
+ β”‚ β”œβ”€β”€ LAE-80C-benchmark_categories.json
224
+ β”‚ β”œβ”€β”€ LAE-80C-benchmark.json
225
+ β”‚ └── LAE-80C-benchmark.txt
226
+ β”œβ”€β”€ LAE-COD
227
+ β”‚ β”œβ”€β”€ AID
228
+ β”‚ β”œβ”€β”€ EMS
229
+ β”‚ β”œβ”€β”€ NWPU-RESISC45
230
+ β”‚ └── SLM
231
+ └── LAE-FOD
232
+ β”œβ”€β”€ DIOR
233
+ β”œβ”€β”€ DOTAv2
234
+ β”œβ”€β”€ FAIR1M
235
+ β”œβ”€β”€ HRSC2016
236
+ β”œβ”€β”€ NWPU-VHR-10
237
+ β”œβ”€β”€ Power-Plant
238
+ β”œβ”€β”€ RSOD
239
+ └── Xview
240
+ ```
241
+
242
+ ## Model
243
+
244
+ The pipeline for solving the LAE task: LAE-Label Engine expands vocabulary for open-vocabulary pre-training; LAE-DINO is a DINO-based open-vocabulary detector with Dynamic Vocabulary Construction (DVC) and Visual-Guided Text Prompt Learning (VisGT), which has a pre-training and fine-tuning paradigm for open-set and closed-set detection.
245
+
246
+ <p align="center">
247
+ <img src="assets/LAE-DINO-Pipeline.png" alt="Image" width="700">
248
+ </p>
249
+
250
+ ### Installation Environment
251
+
252
+ The experimental environment is based on [mmdetection](https://github.com/open-mmlab/mmdetection/blob/main/docs/zh_cn/get_started.md), the installation environment reference mmdetection's [installation guide](https://github.com/open-mmlab/mmdetection/blob/main/docs/zh_cn/get_started.md). You can refer to my environment [lae_requirements.txt](http://vgithub.com/jaychempan/LAE-DINO/blob/main/lae_requirements.txt) if you encounter problems.
253
+ ```
254
+ conda create --name lae python=3.8 -y
255
+ conda activate lae
256
+ cd LAE-DINO/mmdetection_lae
257
+ pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
258
+ pip install -U openmim
259
+ mim install mmengine
260
+ mim install "mmcv>=2.0.0"
261
+
262
+ # 开发幢直ζŽ₯运葌 mmdet
263
+ pip install -v -e .
264
+ pip install -r requirements/multimodal.txt
265
+ pip install emoji ddd-dataset
266
+ pip install git+https://github.com/lvis-dataset/lvis-api.git
267
+ ```
268
+ Then download the BERT weights `bert-base-uncased` into the weights directory,
269
+ ```
270
+ cd LAE-DINO
271
+ huggingface-cli download --resume-download google-bert/bert-base-uncased --local-dir weights/bert-base-uncased
272
+ ```
273
+
274
+ ### Train LAE-DINO Model
275
+
276
+ #### Pre-training
277
+
278
+ ```
279
+ ./tools/dist_train.sh configs/lae_dino/lae_dino_swin-t_pretrain_LAE-1M.py 4
280
+ ```
281
+
282
+ Continuing training from the last training breakpoint,
283
+
284
+ ```
285
+ ./tools/dist_train_lae.sh configs/lae_dino/lae_dino_swin-t_pretrain_LAE-1M.py 4
286
+ ```
287
+
288
+ #### Fine-tuning
289
+
290
+ ```
291
+ # DIOR
292
+ ./tools/dist_train.sh configs/lae_dino/lae_dino_swin-t_finetune_DIOR.py 4
293
+ # DOTAv2
294
+ ./tools/dist_train.sh configs/lae_dino/lae_dino_swin-t_finetune_DOTA.py 4
295
+ ```
296
+
297
+ **Note: About Data Labeling Description**
298
+
299
+ | File Name | Description |
300
+ |----------------------------------|-----------------------------------------------------------------------------|
301
+ | DOTAv2_train.json | Original annotation file |
302
+ | processed_DOTAv2_train.json | Splits images with too many annotations to prevent GPU memory overflow |
303
+ | processed_LAE-1M_DOTAv2_train.json | Sampled from `processed_DOTAv2_train.json` to construct the LAE-1M dataset |
304
+
305
+ ### Test LAE-DINO Model
306
+
307
+ ```
308
+ ./tools/dist_test.sh configs/lae_dino/lae_dino_swin-t_pretrain_LAE-1M.py /path/to/model/ 4
309
+ ```
310
+
311
+ ### Inference LAE-DINO Model
312
+
313
+ #### Open Vocabulary Object Detection
314
+
315
+ For one image example:
316
+ ```
317
+ python demo/image_demo.py images/airplane.jpg \
318
+ configs/lae_dino/lae_dino_swin-t_pretrain_LAE-1M.py \
319
+ --weights /path/to/model/ \
320
+ --texts 'playground . road . tank . airplane . vehicle' -c \
321
+ --palette random \
322
+ --pred-score-thr 0.4
323
+ ```
324
+ For one dirs:
325
+ ```
326
+ python demo/image_demo.py images/ \
327
+ configs/lae_dino/lae_dino_swin-t_pretrain_LAE-1M.py \
328
+ --weights /path/to/model/ \
329
+ --texts 'playground . road . tank . airplane . vehicle' -c \
330
+ --palette random \
331
+ --pred-score-thr 0.4
332
+ ```
333
+
334
+ ### Analyze
335
+
336
+ ```
337
+ python tools/analysis_tools/analyze_logs.py plot_curve
338
+ ```
339
+
340
+ Based on the stable version of the LAE-1M dataset, we used 4-card A100 and ran 32 epochs with 4 per card batch size. LAE-80C considers more categories and can be used as a benchmark for zero-shot and few-shot in remote sensing.
341
+
342
+ | Method | DIOR AP50 | DOTAv2.0 mAP | LAE-80C mAP |
343
+ |------------|------|-------------|-------------|
344
+ | LAE-DINO-T | 87.3 | 51.5 | 24.1 [[weight]](https://drive.google.com/file/d/1EiR8KtNRYIeOfvtIe9C82cQk_uOMIQ8U/view?usp=sharing) |
345
+
346
+ ## Discussion
347
+ - Our work is suitable for zero-shot and few-shot benchmark models in remote sensing, which can be used for pre-detection of some common and uncommon categories.
348
+
349
+ - Regarding β€œLocate” and β€œDetect”, in most of the articles, these two are the same, because most of the tasks are only concerned with the relative position of the ROI, but in the actual remote sensing detection, the latitude and longitude can be calculated by calculating the world position and the position of the image.
350
+
351
+ ## Statement
352
+
353
+ ### Acknowledgement
354
+
355
+ This project references and uses the following open source models and datasets. Thanks also to `ETH ZΓΌrich` and `INSAIT` for partial computing support.
356
+
357
+ #### Related Open Source Models
358
+
359
+ - [MM-Grounding-DINO](https://github.com/open-mmlab/mmdetection/blob/main/configs/mm_grounding_dino/README.md)
360
+ - [segment-anything](https://github.com/facebookresearch/segment-anything?tab=readme-ov-file)
361
+ - [InternVL](https://github.com/OpenGVLab/InternVL/tree/main)
362
+ - [MTP](https://github.com/ViTAE-Transformer/MTP)
363
+
364
+ #### Related Open Source Datasets
365
+
366
+ - [DOTA Dataset](https://captain-whu.github.io/DOTA/dataset.html)
367
+ - [DIOR Dataset](http://www.escience.cn/people/gongcheng/DIOR.html)
368
+ - [FAIR1M Dataset](https://arxiv.org/abs/2103.05569)
369
+ - [AID Dataset](https://captain-whu.github.io/AID/)
370
+ - [RSICD Dataset](https://github.com/201528014227051/RSICD_optimal)
371
+ - [NWPU Dataset](https://gjy3035.github.io/NWPU-Crowd-Sample-Code/)
372
+ - [RSOD Dataset](https://github.com/RSIA-LIESMARS-WHU/RSOD-Dataset-)
373
+ - [HRSC2016 Dataset](http://www.escience.cn/people/liuzikun/DataSet.html)
374
+ - [Power-Plant Dataset](https://github.com/SPDQ/Power-Plant-Detection-in-RSI)
375
+ - [SLM Dataset](https://github.com/xiaoyuan1996/SemanticLocalizationMetrics)
376
+
377
+ ### Citation
378
+
379
+ If you are interested in the following work, please cite the following paper.
380
+
381
+ ```
382
+ @inproceedings{pan2025locate,
383
+ title={Locate anything on earth: Advancing open-vocabulary object detection for remote sensing community},
384
+ author={Pan, Jiancheng and Liu, Yanxing and Fu, Yuqian and Ma, Muyuan and Li, Jiahao and Paudel, Danda Pani and Van Gool, Luc and Huang, Xiaomeng},
385
+ booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
386
+ volume={39},
387
+ number={6},
388
+ pages={6281--6289},
389
+ year={2025}
390
+ }
391
+
392
+ or
393
+
394
+ @misc{pan2024locateearthadvancingopenvocabulary,
395
+ title={Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community},
396
+ author={Jiancheng Pan and Yanxing Liu and Yuqian Fu and Muyuan Ma and Jiaohao Li and Danda Pani Paudel and Luc Van Gool and Xiaomeng Huang},
397
+ year={2024},
398
+ eprint={2408.09110},
399
+ archivePrefix={arXiv},
400
+ primaryClass={cs.CV},
401
+ url={https://arxiv.org/abs/2408.09110},
402
+ }
403
+ ```