SolarVQA Zero-Shot Evaluation Pipeline

This repository contains the zero-shot evaluation pipeline for SolarVQA, a benchmark for Visual Question Answering (VQA) in photovoltaic defect inspection.

The evaluation script (eval.py) is designed to benchmark over a dozen state-of-the-art Vision-Language Models (VLMs) against the SolarVQA dataset, automatically extracting answers, calculating chance-corrected metrics, and generating comprehensive leaderboards.

📊 About SolarVQA

SolarVQA contains 130,712 QA pairs across 16,339 EL images of silicon solar cells. It tests models across eight complementary question types:

Existence
Counting
Type Identification
Severity
Localisation
Co-occurrence
Comparative (Horizontal & Vertical)

Note: The original Electroluminescence (EL) images are not included here and must be downloaded from the UCF-EL-Defect dataset.

🤖 Supported Models

The pipeline natively supports the following model families via Hugging Face transformers and specific backend wrappers:

LLaVA Family: LLaVA-1.5 (7B/13B), LLaVA-NeXT (Mistral-7B, Yi-34B)
Qwen Family: Qwen2-VL (2B/7B), Qwen2.5-VL (3B/7B)
SmolVLM Family: SmolVLM (256M/500M), SmolVLM2 (2.2B)
Idefics3: 8B
LLaMA-3.2 Vision: 11B
Phi-3.5 Vision: 4B
PaliGemma: 3B
MiniCPM-V: 2.6
InternVL2: 8B, 26B
InstructBLIP: Vicuna-7B
Moondream2

⚙️ Dependencies & Installation

Base Python Packages:

pip install torch torchvision transformers accelerate
pip install Pillow numpy pandas scikit-learn tqdm

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support