arampacha/rsicd
Viewer • Updated • 10.9k • 518 • 12
How to use rishii100/clip-rsicd-finetuned with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("zero-shot-image-classification", model="rishii100/clip-rsicd-finetuned")
pipe(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png",
candidate_labels=["animals", "humans", "landscape"],
) # Load model directly
from transformers import AutoProcessor, AutoModelForZeroShotImageClassification
processor = AutoProcessor.from_pretrained("rishii100/clip-rsicd-finetuned")
model = AutoModelForZeroShotImageClassification.from_pretrained("rishii100/clip-rsicd-finetuned")CLIP-SatIR is a fine-tuned CLIP (ViT-B/32) model trained on the RSICD remote sensing dataset for cross-modal satellite image retrieval. The model aligns satellite imagery and natural language captions in a shared embedding space using contrastive learning.
Supports:
Contrastive CLIP loss: L = CrossEntropy(sim(image_i, text_j))
The model learns:
| Parameter | Value |
|---|---|
| Batch Size | 32 |
| Learning Rate | 1e-5 |
| Optimizer | AdamW |
| Epochs | 5 |
| Hardware | CUDA GPU |
This model is designed for:
from transformers import CLIPModel, CLIPProcessor
from PIL import Image
import torch
model = CLIPModel.from_pretrained("rishii100/clip-rsicd-finetuned")
processor = CLIPProcessor.from_pretrained("rishii100/clip-rsicd-finetuned")
text = ["an airport with multiple airplanes and runway"]
inputs = processor(text=text, return_tensors="pt", padding=True)
with torch.no_grad():
text_features = model.get_text_features(**inputs)
print(text_features.shape)