File size: 2,094 Bytes
731ff41 974e7a4 731ff41 974e7a4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
---
base_model:
- erenzhou/GeoGround
- liuhaotian/llava-v1.5-7b
datasets:
- erenzhou/AirSpatial
- erenzhou/refGeo
library_name: transformers
pipeline_tag: image-text-to-text
---
# AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognition and Retrieval
[**Paper**](https://huggingface.co/papers/2601.01416) | [**Code**](https://github.com/VisionXLab/AirSpatialBot) | [**Dataset**](https://huggingface.co/datasets/erenzhou/AirSpatial)
AirSpatialBot is a Vision-Language Model (VLM) specifically designed for remote sensing and aerial drone imagery. It addresses the limitations of existing VLMs in spatial understanding by introducing specialized tasks like Spatial Grounding (SG) and Spatial Question Answering (SQA).
## Key Features
- **Spatially-Aware Training:** Employs a two-stage training strategy (Image Understanding Pre-training and Spatial Understanding Fine-tuning) to bridge the gap between general vision tasks and aerial spatial awareness.
- **3D Grounding:** It is the first remote sensing grounding model to utilize 3D Bounding Boxes (3DBB), enhancing its capability for precise vehicle localization.
- **Fine-Grained Attribute Recognition:** Capable of identifying specific vehicle brands, models, and pricing information from high-altitude imagery.
- **Aerial Agent Capabilities:** Integrates task planning and spatial reasoning to act as an agent for complex retrieval queries in remote sensing scenarios.
## Model Training
The model is built upon the LLaVA-v1.5-7b architecture and was fine-tuned using the **AirSpatial** dataset, which comprises over 206K instructions tailored for spatial tasks in aerial imagery.
## Citation
```bibtex
@ARTICLE{zhou2025airspatialbot,
author={Zhou, Yue and Ding, Ran and Yang, Xue and Jiang, Xue and Liu, Xingzhao},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval},
year={2025},
volume={},
number={},
pages={1-1},
doi={10.1109/TGRS.2025.3570895}
}
``` |