--- license: mit library_name: pytorch tags: - clip - siglip - zero-shot-image-classification - interpretability - concept-bottleneck - vision-language - cvpr2026 datasets: - oonat/ezpc-embeddings base_model: - openai/clip-rn50 - openai/clip-vit-base-patch32 - openai/clip-vit-large-patch14 - google/siglip-so400m-patch14-384 pipeline_tag: zero-shot-image-classification --- # EZPC - Pre-trained Concept Projection Matrices This repository hosts the trained projection matrices **A** for **Explaining CLIP Zero-shot Predictions Through Concepts** (CVPR 2026). - 📄 **Paper:** [arXiv:2603.28211](https://arxiv.org/abs/2603.28211) - 💻 **Code:** [github.com/oonat/ezpc](https://github.com/oonat/ezpc) - 🌐 **Project page:** [oonat.github.io/ezpc](https://oonat.github.io/ezpc) - 🤗 **Embeddings:** [oonat/ezpc-embeddings](https://huggingface.co/datasets/oonat/ezpc-embeddings) ## Repository Layout Each checkpoint is a single PyTorch tensor file (`best_A.pth`) inside a folder whose name encodes the training configuration: ``` checkpoints/ └── _backbone__weight_<λ>_epoch__lr__bs_/ └── best_A.pth ``` For example: ``` checkpoints/CIFAR-100_backbone_RN50_weight_1.0_epoch_10000_lr_0.01_bs_1000000/best_A.pth ``` The tensor `best_A.pth` has shape **(d, m)** where *d* is the backbone embedding dimension and *m* is the number of concepts in the dataset's concept vocabulary. ## Quickstart ### 1. Download a checkpoint ```bash pip install huggingface-hub # Download all checkpoints hf download oonat/ezpc-checkpoints \ --local-dir . \ --include "checkpoints/*" # Or just one hf download oonat/ezpc-checkpoints \ --local-dir . \ --include "checkpoints/CIFAR-100_backbone_RN50_weight_1.0_epoch_10000_lr_0.01_bs_1000000/*" ``` ### 2. Load and use it The checkpoint can be loaded directly with PyTorch: ```python import torch A = torch.load( "checkpoints/CIFAR-100_backbone_RN50_weight_1.0_epoch_10000_lr_0.01_bs_1000000/best_A.pth", weights_only=True, ).float() print(A.shape) # (d, m) ``` To run zero-shot evaluation, qualitative concept visualizations, faithfulness analyses, or any of the experiments from the paper, clone the [EZPC GitHub repo](https://github.com/oonat/ezpc). Evaluation also requires the pre-computed image **and** cached text embeddings. Download them from the [embeddings dataset](https://huggingface.co/datasets/oonat/ezpc-embeddings) into `./data`: ```bash hf download oonat/ezpc-embeddings --repo-type dataset --local-dir data ``` Then point `--checkpoint_path` at the downloaded `best_A.pth`: ```bash python test.py \ --dataset CIFAR-100 \ --dataset_root ./data \ --backbone RN50 \ --checkpoint_path ./checkpoints/CIFAR-100_backbone_RN50_weight_1.0_epoch_10000_lr_0.01_bs_1000000/best_A.pth ``` ## Citation ```bibtex @InProceedings{Ozdemir_2026_CVPR, author = {Ozdemir, Onat and Christensen, Anders and Alaniz, Stephan and Akata, Zeynep and Akbas, Emre}, title = {Explaining CLIP Zero-shot Predictions Through Concepts}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {31336-31345} } ``` ## Acknowledgements The concept vocabularies and class label mapping files were originally curated by the [Label-free Concept Bottleneck Models](https://github.com/Trustworthy-ML-Lab/Label-free-CBM) authors. We thank them for open-sourcing these resources. ## License Released under the MIT License. Note that these checkpoints were trained on embeddings derived from CIFAR-100, CUB-200-2011, Places365, ImageNet, and ImageNet-100. Users are responsible for complying with the original license and terms of use of those datasets, which may restrict commercial use — notably ImageNet and CUB-200-2011, which are released for non-commercial research only.