GeoRSCLIP-ViT-L-14

This model is a mirror/redistribution of the original GeoRSCLIP model.

Original Repository and Links

Description

GeoRSCLIP is a vision-language foundation model for remote sensing, trained on a large-scale dataset of remote sensing image-text pairs (RS5M). It is based on the CLIP architecture and is designed to handle the unique characteristics of remote sensing imagery.

Citation

If you use this model in your research, please cite the original work:

@article{zhangRS5MGeoRSCLIPLargeScale2024,
  title = {{{RS5M}} and {{GeoRSCLIP}}: {{A Large-Scale Vision-Language Dataset}} and a {{Large Vision-Language Model}} for {{Remote Sensing}}},
  shorttitle = {{{RS5M}} and {{GeoRSCLIP}}},
  author = {Zhang, Zilun and Zhao, Tiancheng and Guo, Yulong and Yin, Jianwei},
  year = 2024,
  journal = {TGRS},
  volume = {62},
  pages = {1--23},
  issn = {1558-0644},
  doi = {10.1109/TGRS.2024.3449154},
  urldate = {2024-12-15},
  keywords = {Computational modeling,Data models,Domain VLM (DVLM),general VLM (GVLM),image-text paired dataset,Location awareness,parameter efficient tuning,Remote sensing,remote sensing (RS),RS cross-modal text-image retrieval (RSCTIR),semantic localization (SeLo),Semantics,Tuning,vision-language model (VLM),Visualization,zero-shot classification (ZSC)}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including BiliSakura/GeoRSCLIP-ViT-L-14