SolarVQA Zero-Shot Evaluation Pipeline
This repository contains the zero-shot evaluation pipeline for SolarVQA, a benchmark for Visual Question Answering (VQA) in photovoltaic defect inspection.
The evaluation script (eval.py) is designed to benchmark over a dozen state-of-the-art Vision-Language Models (VLMs) against the SolarVQA dataset, automatically extracting answers, calculating chance-corrected metrics, and generating comprehensive leaderboards.
π About SolarVQA
SolarVQA contains 130,712 QA pairs across 16,339 EL images of silicon solar cells. It tests models across eight complementary question types:
- Existence
- Counting
- Type Identification
- Severity
- Localisation
- Co-occurrence
- Comparative (Horizontal & Vertical)
Note: The original Electroluminescence (EL) images are not included here and must be downloaded from the UCF-EL-Defect dataset.
π€ Supported Models
The pipeline natively supports the following model families via Hugging Face transformers and specific backend wrappers:
- LLaVA Family: LLaVA-1.5 (7B/13B), LLaVA-NeXT (Mistral-7B, Yi-34B)
- Qwen Family: Qwen2-VL (2B/7B), Qwen2.5-VL (3B/7B)
- SmolVLM Family: SmolVLM (256M/500M), SmolVLM2 (2.2B)
- Idefics3: 8B
- LLaMA-3.2 Vision: 11B
- Phi-3.5 Vision: 4B
- PaliGemma: 3B
- MiniCPM-V: 2.6
- InternVL2: 8B, 26B
- InstructBLIP: Vicuna-7B
- Moondream2
βοΈ Dependencies & Installation
- Base Python Packages:
pip install torch torchvision transformers accelerate pip install Pillow numpy pandas scikit-learn tqdm