| license: cc | |
| license_name: creative-commons-attribution-4.0-international | |
| license_link: https://creativecommons.org/licenses/by/4.0/ | |
| pipeline_tag: image-classification | |
| tags: | |
| - knowledge-distillation | |
| - modular-neural-architecture | |
| # m2mKD | |
| This repository contains the checkpoints for [m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers](https://arxiv.org/abs/2402.16918). | |
| ## Released checkpoints | |
| For the usage of the checkpoints listed below, please refer to the instructions provided on our [GitHub repo](https://github.com/kamanphoebe/m2mKD). | |
| - `nac_scale_tinyimnet.pth`/`nac_scale_imnet.pth`: NAC model with a scale-free prior trained using m2mKD. | |
| - `vmoe_base.pth`: V-MoE-Base model trained using m2mKD. | |
| - `FT_huge`: a directory containing DeiT-Huge teacher modules for NAC model training. | |
| - `nac_tinyimnet_students`: a directory containing NAC student modules for Tiny-ImageNet. | |
| ## Acknowledgement | |
| Our implementation is mainly based on [Deep-Incubation](https://github.com/LeapLabTHU/Deep-Incubation). | |
| ## Citation | |
| If you use the checkpoints, please cite our paper: | |
| ``` | |
| @misc{lo2024m2mkd, | |
| title={m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers}, | |
| author={Ka Man Lo and Yiming Liang and Wenyu Du and Yuantao Fan and Zili Wang and Wenhao Huang and Lei Ma and Jie Fu}, | |
| year={2024}, | |
| eprint={2402.16918}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.LG} | |
| } | |
| ``` |