KimingChen
/

LaZSL

Zero-Shot Image Classification

Model card Files Files and versions

LaZSL / README.md

KimingChen's picture

Update README.md

7cd628b verified 7 months ago

|

history blame contribute delete

2.12 kB

	---
	license: apache-2.0
	datasets:
	- ILSVRC/imagenet-1k
	- bentrevett/caltech-ucsd-birds-200-2011
	- vaishaal/ImageNetV2
	- clip-benchmark/wds_imagenet_sketch
	- clip-benchmark/wds_imagenet-r
	- enterprise-explorers/oxford-pets
	- ethz/food101
	- clip-benchmark/wds_imagenet-a
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- openai/clip-vit-large-patch14
	- openai/clip-vit-base-patch32
	pipeline_tag: zero-shot-image-classification
	tags:
	- code
	---
	# LaZSL
	This repository contains the code for the ICCV'25 paper titled with "*Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model*".

	Pre-print version at [[arXiv]](https://arxiv.org/pdf/2506.23822)

	## Requirements
	First install the dependencies.

	Either manually:
	```
	conda install pytorch torchvision -c pytorch
	conda install matplotlib torchmetrics -c conda-forge
	```

	## Preparing Dataset
	Please follow the instructions [DATASETS.md](https://github.com/KaiyangZhou/CoOp/blob/main/DATASETS.md) to construct the datasets.

	## Running

	To reproduce accuracy results from the paper: edit the directories to match your local machine in `load_OP.py` and set `hparams['dataset']` accordingly. Then simply run `python main_OP.py`.
	Furthermore, all hyperparameters related to the different datasets are provided in the load_OP.py and all hyperparameters can be modified.

	## Results
	Results of our released models using various evaluation protocols on 6 datasets.


	\| Dataset \| Acc(ViT-B/32) \| Acc(ViT-B/16) \| Acc(ViT-L/14) \|
	\| :-----: \| :-----: \| :-----: \| :-----: \|
	\| Imagenet \| 65.3 \| 69.2\| 75.7 \|
	\| CUB \| 56.5 \| 60.3 \| 66.1 \|
	\| OxfordPets \| 84.7 \| 87.4 \| 92.7 \|
	\| Food101 \| 85.9 \| 89.7 \| 93.5 \|
	\| Place365 \| 41.5 \| 42.0 \| 41.8 \|

	## Citation
	If you find LaZSL is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

	```bibtex
	@inproceedings{chen2025interpretable,
	title={Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model},
	author={Chen, Shiming and Duan, Bowen and Khan, Salman and Khan, Fahad Shahbaz},
	booktitle={ICCV}
	year={2025}
	}
	```