| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - ILSVRC/imagenet-1k |
| | - bentrevett/caltech-ucsd-birds-200-2011 |
| | - vaishaal/ImageNetV2 |
| | - clip-benchmark/wds_imagenet_sketch |
| | - clip-benchmark/wds_imagenet-r |
| | - enterprise-explorers/oxford-pets |
| | - ethz/food101 |
| | - clip-benchmark/wds_imagenet-a |
| | language: |
| | - en |
| | metrics: |
| | - accuracy |
| | base_model: |
| | - openai/clip-vit-large-patch14 |
| | - openai/clip-vit-base-patch32 |
| | pipeline_tag: zero-shot-image-classification |
| | tags: |
| | - code |
| | --- |
| | # LaZSL |
| | This repository contains the code for the ICCV'25 paper titled with "***Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model***". |
| |
|
| | Pre-print version at [[arXiv]](https://arxiv.org/pdf/2506.23822) |
| |
|
| | ## Requirements |
| | First install the dependencies. |
| |
|
| | Either manually: |
| | ``` |
| | conda install pytorch torchvision -c pytorch |
| | conda install matplotlib torchmetrics -c conda-forge |
| | ``` |
| |
|
| | ## Preparing Dataset |
| | Please follow the instructions [DATASETS.md](https://github.com/KaiyangZhou/CoOp/blob/main/DATASETS.md) to construct the datasets. |
| |
|
| | ## Running |
| |
|
| | To reproduce accuracy results from the paper: edit the directories to match your local machine in `load_OP.py` and set `hparams['dataset']` accordingly. Then simply run `python main_OP.py`. |
| | Furthermore, all hyperparameters related to the different datasets are provided in the load_OP.py and all hyperparameters can be modified. |
| | |
| | ## Results |
| | Results of our released models using various evaluation protocols on 6 datasets. |
| | |
| | |
| | | Dataset | Acc(ViT-B/32) | Acc(ViT-B/16) | Acc(ViT-L/14) | |
| | | :-----: | :-----: | :-----: | :-----: | |
| | | Imagenet | 65.3 | 69.2| 75.7 | |
| | | CUB | 56.5 | 60.3 | 66.1 | |
| | | OxfordPets | 84.7 | 87.4 | 92.7 | |
| | | Food101 | 85.9 | 89.7 | 93.5 | |
| | | Place365 | 41.5 | 42.0 | 41.8 | |
| | |
| | ## Citation |
| | If you find LaZSL is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry. |
| | |
| | ```bibtex |
| | @inproceedings{chen2025interpretable, |
| | title={Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model}, |
| | author={Chen, Shiming and Duan, Bowen and Khan, Salman and Khan, Fahad Shahbaz}, |
| | booktitle={ICCV} |
| | year={2025} |
| | } |
| | ``` |