CLIPSeg Fine-tuned for Cloud Segmentation (LoRA, 100% Data)
LoRA-adapted version of CIDAS/clipseg-rd64-refined for cloud segmentation on Sentinel-2 satellite imagery using the CloudSEN12+ dataset. This model is part of the research presented in:
Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift Harshith Kethavath, Weiming Hu EarthVision Workshop @ CVPR 2026
All models from this paper: https://huggingface.co/collections/uga-gaim/2026-cloudprompts
Model Description
CLIPSeg is a vision-language segmentation model trained on natural images. This variant uses Low-Rank Adaptation (LoRA) to adapt CLIPSeg to Sentinel-2 satellite imagery for four-class cloud segmentation: clear, thick cloud, thin cloud, and cloud shadow. Compared to full fine-tuning, LoRA trains only a small fraction of parameters (~16MB adapter weights vs. ~603MB full model), making it a lightweight alternative.
- Developed by: Harshith Kethavath, Weiming Hu
- Lab: Lab for Geoinformatics and AI Modeling (GAIM), University of Georgia
- License: CC BY 4.0
- Base model: CIDAS/clipseg-rd64-refined
- PEFT method: LoRA (rank 32, α = 64)
How to Get Started
from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation
from peft import PeftModel
import torch
from PIL import Image
base_model = CLIPSegForImageSegmentation.from_pretrained("CIDAS/clipseg-rd64-refined")
model = PeftModel.from_pretrained(base_model, "uga-gaim/CLIPSeg-CloudSEN12Plus-LoRA")
processor = CLIPSegProcessor.from_pretrained("uga-gaim/CLIPSeg-CloudSEN12Plus-LoRA")
image = Image.open("your_sentinel2_image.png")
prompts = ["clear", "thick cloud", "thin cloud", "cloud shadow"]
inputs = processor(
text=prompts,
images=[image] * len(prompts),
return_tensors="pt",
padding=True
)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits # shape: (4, H, W)
predicted_class = logits.argmax(dim=0) # per-pixel class prediction
Training Details
Training Data
Trained on the CloudSEN12+ dataset, the largest expert-labeled cloud segmentation benchmark for Sentinel-2 imagery. 100% of the training split was used (full data setting).
Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning rate | 2e-4 |
| Weight decay | 0.01 |
| Warmup ratio | 0.03 |
| Epochs | 15 |
| Batch size | 16 |
| LoRA rank (r) | 32 |
| LoRA alpha (α) | 64 |
| LoRA dropout | 0.05 |
| Precision | fp16 |
Loss Function
Combined segmentation loss: weighted sum of Focal loss, Tversky loss, and Boundary loss.
Evaluation Results
Evaluated on the CloudSEN12+ test split. Per-class IoU:
| Class | Zero-Shot (baseline) | This model (LoRA 100%) |
|---|---|---|
| Clear | 0.5205 | 0.8269 |
| Thick Cloud | 0.2773 | 0.7488 |
| Thin Cloud | 0.0898 | 0.3820 |
| Cloud Shadow | 0.1325 | 0.4389 |
| mIoU | 0.2550 | 0.5991 |
Citation
@misc{kethavath2026lowdatasupervisedadaptationoutperforms,
title={Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift},
author={Harshith Kethavath and Weiming Hu},
year={2026},
eprint={2604.08956},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.08956},
}
- Downloads last month
- 34
Model tree for uga-gaim/CLIPSeg-CloudSEN12Plus-LoRA
Base model
CIDAS/clipseg-rd64-refined