Add model card and metadata
Browse filesThis PR adds a model card and metadata for the RAVLT models, including the `image-classification` pipeline tag and an MIT license (assumed, please verify). The model card includes a description of the model, its key features, and performance metrics on ImageNet-1k. Placeholders for usage examples and the code repository link are included and should be updated once the code becomes publicly available. I've included the performance metrics and checkpoints for image classification, as that's the primary focus. The README includes performance for Object Detection, Instance Segmentation and Semantic Segmentation, but I've omitted those in favor of keeping the card concise and focused.
README.md
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: image-classification
|
| 3 |
+
license: mit
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Breaking the Low-Rank Dilemma of Linear Attention: RAVLT Model Card
|
| 7 |
+
|
| 8 |
+
This model card describes the Rank-Augmented Vision Linear Transformer (RAVLT), introduced in the paper "[Breaking the Low-Rank Dilemma of Linear Attention](https://arxiv.org/abs/2411.07635)". RAVLT achieves state-of-the-art performance on ImageNet-1k classification while maintaining linear complexity.
|
| 9 |
+
|
| 10 |
+
**Key Features:**
|
| 11 |
+
|
| 12 |
+
* High accuracy: Achieves 84.4% Top-1 accuracy on ImageNet-1k (RAVLT-S).
|
| 13 |
+
* Parameter efficiency: Uses only 26M parameters (RAVLT-S).
|
| 14 |
+
* Computational efficiency: Achieves 4.6G FLOPs (RAVLT-S).
|
| 15 |
+
* Linear complexity.
|
| 16 |
+
|
| 17 |
+
RAVLT is based on Rank-Augmented Linear Attention (RALA), a novel attention mechanism that addresses the low-rank limitations of standard linear attention.
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
## Model Variants
|
| 21 |
+
|
| 22 |
+
Several RAVLT variants were trained, offering different tradeoffs between accuracy, parameters, and FLOPs:
|
| 23 |
+
|
| 24 |
+
| Model | Params (M) | FLOPs (G) | Checkpoint |
|
| 25 |
+
| -------- | ---------- | --------- | --------------- |
|
| 26 |
+
| RAVLT-T | 15 | 2.4 | [RAVLT-T](https://huggingface.co/aldjalkdf/RAVLT/blob/main/RAVLT_T.pth) |
|
| 27 |
+
| RAVLT-S | 26 | 4.6 | [RAVLT-S](https://huggingface.co/aldjalkdf/RAVLT/blob/main/RAVLT_S.pth) |
|
| 28 |
+
| RAVLT-B | 48 | 9.9 | [RAVLT-B](https://huggingface.co/aldjalkdf/RAVLT/blob/main/RAVLT_B.pth) |
|
| 29 |
+
| RAVLT-L | 95 | 16.0 | [RAVLT-L](https://huggingface.co/aldjalkdf/RAVLT/blob/main/RAVLT_L.pth) |
|
| 30 |
+
|
| 31 |
+
**Note:** Accuracy values from the paper have not been transcribed, but should be added once the code is available and the accuracy can be independently verified.
|
| 32 |
+
|
| 33 |
+
## How to use (Placeholder - Awaiting Code Release)
|
| 34 |
+
|
| 35 |
+
Instructions on how to use the model will be provided once the code repository is available. Code will be available at https://github.com/qhfan/RALA.
|
| 36 |
+
|
| 37 |
+
## Citation
|
| 38 |
+
|
| 39 |
+
```bibtex
|
| 40 |
+
@misc{fan2024breakinglowrank,
|
| 41 |
+
title={Breaking the Low-Rank Dilemma of Linear Attention},
|
| 42 |
+
author={Fan, Qinghao and Liu, Zheng and Li, Hongsheng and Yang, Yisen and Li, Hang},
|
| 43 |
+
year={2024},
|
| 44 |
+
eprint={2411.07635},
|
| 45 |
+
archivePrefix={arXiv},
|
| 46 |
+
primaryClass={cs.CV},
|
| 47 |
+
url={https://arxiv.org/abs/2411.07635},
|
| 48 |
+
}
|
| 49 |
+
```
|