| --- |
| license: apache-2.0 |
| pipeline_tag: image-classification |
| --- |
| |
| # LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation |
|
|
| [Paper](https://huggingface.co/papers/2606.19483) | [GitHub](https://github.com/KevinZ0217/LEAP) | [Project Page](https://kevinz0217.github.io/LEAP_page/) |
|
|
| This repository contains the ViT-Tiny and ViT-S checkpoints (No Register) distilled from ViT-G DINOv2 on ImageNet-100 and ImageNet-1K. The knowledge distillation process follows the procedure proposed in the paper **"LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation"**. |
|
|
| ### Introduction |
|
|
| Vision Foundation Models (VFMs) with ViT backbones, such as DINOv2, are computationally demanding. LEAP (Layer-skipping Efficiency via Adaptive Progression) is a training curriculum for ViT feature-based knowledge distillation. Instead of supervising the student against a fixed teacher block, LEAP advances the supervisory target through the teacher's feature maps (shallow-to-deep) based on online CKA alignment. This allows the student to build a foundational representation before tackling higher-level abstractions. |
|
|
| ### Use cases |
| The ViT models output feature maps that can be used for a variety of downstream tasks, including: |
| - Image Classification |
| - Instance Retrieval |
| - Semantic Segmentation |
|
|
| ### Performance |
|
|
| #### ImageNet-100: |
|  |
|
|
|  |
|
|
|  |
|
|
| #### ImageNet-1K: |
|
|
|  |
|
|
|  |
|
|
|  |
|
|
| ### Citation |
|
|
| ```bibtex |
| @article{leap2026, |
| title={LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation}, |
| author={Zhang, Jiaqi and Lee, Ashton and Wong, Anthony and Zou, John and BuGhanem, Sami and Balestriero, Randall}, |
| journal={arXiv preprint arXiv:2606.19483}, |
| year={2026} |
| } |
| ``` |