You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

PANEL: A Domain-Specific Vision-Language Model for Photovoltaic Tasks in Remote Sensing

Model Description

PANEL (PV-specific vision-lANguage modEL) is a domain-specific Vision-Language Model (VLM) tailored for large-scale photovoltaic (PV) mapping and interpretation in remote sensing (RS). Built upon the CLIP (ViT-B/16) architecture, PANEL is pre-trained on a curated worldwide PV vision-language dataset comprising over one million image-text pairs. It effectively aligns visual features of PV panels with diverse text prompts, enabling robust performance across varying spatial resolutions (0.1m to 20m) and complex urban/rural contexts.

PANEL is designed for:

  • Zero-shot interpretation: Identifying and localizing PV panels without task-specific fine-tuning.
  • Few-shot adaptation: Rapidly adapting to downstream tasks (e.g., segmentation) with minimal labels via the Knowledge Assistance Module (KAM).

How to Use

1. Zero-Shot Inference (Semantic Localization Example)

For zero-shot tasks, PANEL leverages PANEL Surgery to reinforce vision-language alignment. Below is a simplified example of performing semantic localization to generate a similarity map for PV panels. For classification, similarity can be calculated using the cls_token.

import torch
import panel
from PIL import Image
from torchvision.transforms import Compose, ToTensor, Normalize

# 1. Load the pre-trained PANEL model
# The 'panel' library should be installed/available in your environment
model_path = "PANEL-ViT-B-16_ImgSize256.pth"
model, _ = panel.load4panel(model_path, custom_resolution=256, device='cuda')
model.eval().to('cuda')

# 2. Prepare text prompts (Ensemble of PV-related terms)
target_texts = ["PV panels", "solar panels", "photovoltaic modules", "solar arrays"]
prompt_templates = ['a remote sensing image of {}', 'a satellite imagery of {}', 'an aerial image of {}']

# Encode text features with prompt ensemble and remove redundant features (Surgery)
text_features = panel.encode_text_with_prompt_ensemble(model, target_texts, 'cuda', prompt_templates)
redundant_features = panel.encode_text_with_prompt_ensemble(model, [""], 'cuda', prompt_templates)
valuable_text_features = text_features - redundant_features

# 3. Prepare image
preprocess = Compose([ToTensor(), Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711))])
image = Image.open("your_pv_image.tif").convert("RGB")
image_tensor = preprocess(image).unsqueeze(0).to('cuda')

# 4. Inference
with torch.no_grad():
    # Extract patch-level image features
    image_features, _ = model.encode_image(image_tensor) # [B, L, D]
    image_features = image_features / image_features.norm(dim=-1, keepdim=True)
    
    # Compute similarity map
    similarity = (image_features @ valuable_text_features.t())[:, 1:, :]
    similarity_map = panel.get_similarity_map(similarity, (image_tensor.shape[1], image_tensor.shape[2]))
    
    # similarity_map now contains the localization priors for PV panels
    print("Similarity map generated:", similarity_map.shape)

2. Few-Shot Adaptation (with KAM)

For few-shot segmentation, we provide the Knowledge Assistance Module (KAM) to inject PANEL's vision-language priors into baseline models.

Integration with mmsegmentation: To use KAM with a backbone (e.g., MiT), place the provided mit_kam.py file into the mmseg/models/backbones/ directory. KAM interacts with the baseline via gated convolution and cross-attention fusion.

# Example configuration snippet for mmsegmentation
model = dict(
    type='EncoderDecoder',
    backbone=dict(
        type='MixVisionTransformerKAM', # Integrated with KAM
        panel_priors=True,              # Enable PANEL prior injection
        pretrained='PANEL-ViT-B-16_ImgSize256.pth',
        ...),
    decode_head=dict(...),
)

Citation

If you use PANEL in your research, please cite the following paper:

@article{deng2026panel,
  title={PANEL: A Domain-Specific Vision-Language Model for Zero-Shot and Few-Shot Photovoltaic Tasks in Remote Sensing (Under Review)},
  author={Deng, Ruizhe and Guo, Zhiling and Zhang, Penglei and Li, Jiaze and Xu, Xin and Chen, Qi and Chen, Yuntian and Yan, Jinyue},
  journal={ISPRS Journal of Photogrammetry and Remote Sensing},
  year={2026},
  publisher={Elsevier}
}

Acknowledgements

This work was supported by the International Centre of Urban Energy Nexux (UEX) at The Hong Kong Polytechnic University and Eastern Institute of Technology (Ningbo). We thank the authors of the original CLIP model and CLIP Surgery for their foundational work.

Collection

This dataset is part of the UEX-RenewableEnergy Collection.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train UEXdo/PANEL

Collection including UEXdo/PANEL