---
license: mit
tags:
- clip
- feature-extraction
- remote-sensing
---

# Remote-CLIP-ViT-L-14

This model is a mirror/redistribution of the original [RemoteCLIP](https://huggingface.co/chendelong/RemoteCLIP) model.

## Original Repository and Links
- **Original Hugging Face Model**: [chendelong/RemoteCLIP](https://huggingface.co/chendelong/RemoteCLIP)
- **Official GitHub Repository**: [ChenDelong1999/RemoteCLIP](https://github.com/ChenDelong1999/RemoteCLIP)

## Description
RemoteCLIP is a vision-language foundation model for remote sensing, trained on a large-scale dataset of remote sensing image-text pairs. It is based on the CLIP architecture and is designed to handle the unique characteristics of remote sensing imagery.

## Citation
If you use this model in your research, please cite the original work:

```bibtex
@article{remoteclip,
  author       = {Fan Liu and
                  Delong Chen and
                  Zhangqingyun Guan and
                  Xiaocong Zhou and
                  Jiale Zhu and
                  Qiaolin Ye and
                  Liyong Fu and
                  Jun Zhou},
  title        = {RemoteCLIP: {A} Vision Language Foundation Model for Remote Sensing},
  journal      = {{IEEE} Transactions on Geoscience and Remote Sensing},
  volume       = {62},
  pages        = {1--16},
  year         = {2024},
  url          = {https://doi.org/10.1109/TGRS.2024.3390838},
  doi          = {10.1109/TGRS.2024.3390838},
}
```