metadata
license: mit
pipeline_tag: image-classification
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
This repository contains the model weights for the paper CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction.
CARE (deCoupled duAl-interactive lineaR attEntion) is a novel linear attention mechanism designed to unleash the power of linear attention for resource-constrained mobile devices. It utilizes an asymmetrical feature decoupling strategy to manage local inductive bias and long-range dependencies, alongside a dual interaction module to facilitate communication across features and layers.
Performance
The CARE Transformer achieves high efficiency and accuracy on the ImageNet-1K dataset:
| Method | Type | GMACs | Params (M) | Top-1 Acc (%) |
|---|---|---|---|---|
| CARE-S0 | LA+CONV | 0.7 | 7.3 | 78.4 |
| CARE-S1 | LA+CONV | 1.0 | 9.6 | 80.1 |
| CARE-S2 | LA+CONV | 1.9 | 19.5 | 82.1 |
Resources
- Paper: https://arxiv.org/abs/2411.16170
- Official GitHub Repository: https://github.com/zhouyuan888888/CARE-Transformer
Citation
If you find this work useful, please cite:
@inproceedings{zhou2025care,
title={CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction},
author={Zhou, Yuan and Xu, Qingshan and Cui, Jiequan and Zhou, Junbao and Zhang, Jing and Hong, Richang and Zhang, Hanwang},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={20135--20145},
year={2025}
}