AirSpatialBot / README.md
nielsr's picture
nielsr HF Staff
Improve model card with metadata, paper link, and description
974e7a4 verified
|
raw
history blame
2.09 kB
metadata
base_model:
  - erenzhou/GeoGround
  - liuhaotian/llava-v1.5-7b
datasets:
  - erenzhou/AirSpatial
  - erenzhou/refGeo
library_name: transformers
pipeline_tag: image-text-to-text

AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognition and Retrieval

Paper | Code | Dataset

AirSpatialBot is a Vision-Language Model (VLM) specifically designed for remote sensing and aerial drone imagery. It addresses the limitations of existing VLMs in spatial understanding by introducing specialized tasks like Spatial Grounding (SG) and Spatial Question Answering (SQA).

Key Features

  • Spatially-Aware Training: Employs a two-stage training strategy (Image Understanding Pre-training and Spatial Understanding Fine-tuning) to bridge the gap between general vision tasks and aerial spatial awareness.
  • 3D Grounding: It is the first remote sensing grounding model to utilize 3D Bounding Boxes (3DBB), enhancing its capability for precise vehicle localization.
  • Fine-Grained Attribute Recognition: Capable of identifying specific vehicle brands, models, and pricing information from high-altitude imagery.
  • Aerial Agent Capabilities: Integrates task planning and spatial reasoning to act as an agent for complex retrieval queries in remote sensing scenarios.

Model Training

The model is built upon the LLaVA-v1.5-7b architecture and was fine-tuned using the AirSpatial dataset, which comprises over 206K instructions tailored for spatial tasks in aerial imagery.

Citation

@ARTICLE{zhou2025airspatialbot,
  author={Zhou, Yue and Ding, Ran and Yang, Xue and Jiang, Xue and Liu, Xingzhao},
  journal={IEEE Transactions on Geoscience and Remote Sensing}, 
  title={AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval}, 
  year={2025},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TGRS.2025.3570895}
}