| license: mit | |
| tags: | |
| - clip | |
| - feature-extraction | |
| - remote-sensing | |
| # Remote-CLIP-ViT-L-14 | |
| This model is a mirror/redistribution of the original [RemoteCLIP](https://huggingface.co/chendelong/RemoteCLIP) model. | |
| ## Original Repository and Links | |
| - **Original Hugging Face Model**: [chendelong/RemoteCLIP](https://huggingface.co/chendelong/RemoteCLIP) | |
| - **Official GitHub Repository**: [ChenDelong1999/RemoteCLIP](https://github.com/ChenDelong1999/RemoteCLIP) | |
| ## Description | |
| RemoteCLIP is a vision-language foundation model for remote sensing, trained on a large-scale dataset of remote sensing image-text pairs. It is based on the CLIP architecture and is designed to handle the unique characteristics of remote sensing imagery. | |
| ## Citation | |
| If you use this model in your research, please cite the original work: | |
| ```bibtex | |
| @article{remoteclip, | |
| author = {Fan Liu and | |
| Delong Chen and | |
| Zhangqingyun Guan and | |
| Xiaocong Zhou and | |
| Jiale Zhu and | |
| Qiaolin Ye and | |
| Liyong Fu and | |
| Jun Zhou}, | |
| title = {RemoteCLIP: {A} Vision Language Foundation Model for Remote Sensing}, | |
| journal = {{IEEE} Transactions on Geoscience and Remote Sensing}, | |
| volume = {62}, | |
| pages = {1--16}, | |
| year = {2024}, | |
| url = {https://doi.org/10.1109/TGRS.2024.3390838}, | |
| doi = {10.1109/TGRS.2024.3390838}, | |
| } | |
| ``` | |