CLIPSeg Fine-tuned for Cloud Segmentation (LoRA, 100% Data)

LoRA-adapted version of CIDAS/clipseg-rd64-refined for cloud segmentation on Sentinel-2 satellite imagery using the CloudSEN12+ dataset. This model is part of the research presented in:

Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift Harshith Kethavath, Weiming Hu EarthVision Workshop @ CVPR 2026

All models from this paper: https://huggingface.co/collections/uga-gaim/2026-cloudprompts

Model Description

CLIPSeg is a vision-language segmentation model trained on natural images. This variant uses Low-Rank Adaptation (LoRA) to adapt CLIPSeg to Sentinel-2 satellite imagery for four-class cloud segmentation: clear, thick cloud, thin cloud, and cloud shadow. Compared to full fine-tuning, LoRA trains only a small fraction of parameters (~16MB adapter weights vs. ~603MB full model), making it a lightweight alternative.

Developed by: Harshith Kethavath, Weiming Hu
Lab: Lab for Geoinformatics and AI Modeling (GAIM), University of Georgia
License: CC BY 4.0
Base model: CIDAS/clipseg-rd64-refined
PEFT method: LoRA (rank 32, α = 64)

How to Get Started

from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation
from peft import PeftModel
import torch
from PIL import Image

base_model = CLIPSegForImageSegmentation.from_pretrained("CIDAS/clipseg-rd64-refined")
model = PeftModel.from_pretrained(base_model, "uga-gaim/CLIPSeg-CloudSEN12Plus-LoRA")
processor = CLIPSegProcessor.from_pretrained("uga-gaim/CLIPSeg-CloudSEN12Plus-LoRA")

image = Image.open("your_sentinel2_image.png")
prompts = ["clear", "thick cloud", "thin cloud", "cloud shadow"]

inputs = processor(
    text=prompts,
    images=[image] * len(prompts),
    return_tensors="pt",
    padding=True
)

with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits  # shape: (4, H, W)
predicted_class = logits.argmax(dim=0)  # per-pixel class prediction

Training Details

Training Data

Trained on the CloudSEN12+ dataset, the largest expert-labeled cloud segmentation benchmark for Sentinel-2 imagery. 100% of the training split was used (full data setting).

Training Hyperparameters

Hyperparameter	Value
Optimizer	AdamW
Learning rate	2e-4
Weight decay	0.01
Warmup ratio	0.03
Epochs	15
Batch size	16
LoRA rank (r)	32
LoRA alpha (α)	64
LoRA dropout	0.05
Precision	fp16

Loss Function

Combined segmentation loss: weighted sum of Focal loss, Tversky loss, and Boundary loss.

Evaluation Results

Evaluated on the CloudSEN12+ test split. Per-class IoU:

Class	Zero-Shot (baseline)	This model (LoRA 100%)
Clear	0.5205	0.8269
Thick Cloud	0.2773	0.7488
Thin Cloud	0.0898	0.3820
Cloud Shadow	0.1325	0.4389
mIoU	0.2550	0.5991

Citation

@misc{kethavath2026lowdatasupervisedadaptationoutperforms,
      title={Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift}, 
      author={Harshith Kethavath and Weiming Hu},
      year={2026},
      eprint={2604.08956},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.08956}, 
}

Downloads last month: 2

Model tree for uga-gaim/CLIPSeg-CloudSEN12Plus-LoRA

Base model

CIDAS/clipseg-rd64-refined

Adapter

(1)

this model

Dataset used to train uga-gaim/CLIPSeg-CloudSEN12Plus-LoRA

Collection including uga-gaim/CLIPSeg-CloudSEN12Plus-LoRA

2026_CVPRW_CloudPrompts

Collection

Fine-tuned CLIPSeg models for cloud segmentation on Sentinel-2 imagery. Released alongside our EarthVision 2026 @ CVPR paper. • 3 items • Updated Apr 13

Paper for uga-gaim/CLIPSeg-CloudSEN12Plus-LoRA

Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift

Paper • 2604.08956 • Published Apr 10