SolarVQA Zero-Shot Evaluation Pipeline

This repository contains the zero-shot evaluation pipeline for SolarVQA, a benchmark for Visual Question Answering (VQA) in photovoltaic defect inspection.

The evaluation script (eval.py) is designed to benchmark over a dozen state-of-the-art Vision-Language Models (VLMs) against the SolarVQA dataset, automatically extracting answers, calculating chance-corrected metrics, and generating comprehensive leaderboards.

πŸ“Š About SolarVQA

SolarVQA contains 130,712 QA pairs across 16,339 EL images of silicon solar cells. It tests models across eight complementary question types:

  • Existence
  • Counting
  • Type Identification
  • Severity
  • Localisation
  • Co-occurrence
  • Comparative (Horizontal & Vertical)

Note: The original Electroluminescence (EL) images are not included here and must be downloaded from the UCF-EL-Defect dataset.

πŸ€– Supported Models

The pipeline natively supports the following model families via Hugging Face transformers and specific backend wrappers:

  • LLaVA Family: LLaVA-1.5 (7B/13B), LLaVA-NeXT (Mistral-7B, Yi-34B)
  • Qwen Family: Qwen2-VL (2B/7B), Qwen2.5-VL (3B/7B)
  • SmolVLM Family: SmolVLM (256M/500M), SmolVLM2 (2.2B)
  • Idefics3: 8B
  • LLaMA-3.2 Vision: 11B
  • Phi-3.5 Vision: 4B
  • PaliGemma: 3B
  • MiniCPM-V: 2.6
  • InternVL2: 8B, 26B
  • InstructBLIP: Vicuna-7B
  • Moondream2

βš™οΈ Dependencies & Installation

  1. Base Python Packages:
    pip install torch torchvision transformers accelerate
    pip install Pillow numpy pandas scikit-learn tqdm
    
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support