Update README.md
Browse files
README.md
CHANGED
|
@@ -8,5 +8,28 @@ datasets:
|
|
| 8 |
library_name: timm
|
| 9 |
---
|
| 10 |
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
library_name: timm
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# LocAtViT: Locality-Attending Vision Transformer
|
| 12 |
+
|
| 13 |
+
[](https://arxiv.org/abs/2603.04892)
|
| 14 |
+
[](https://github.com/sinahmr/LocAtViT)
|
| 15 |
+
|
| 16 |
+
> Pretrain vision transformers so that their patch representations transfer better to dense prediction (e.g., segmentation), without changing the pretraining objective.
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
## Usage
|
| 20 |
+
|
| 21 |
+
```python
|
| 22 |
+
import timm
|
| 23 |
+
model = timm.create_model("hf_hub:sinahmr/locatvit_base", pretrained=True)
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
## Citation
|
| 27 |
+
|
| 28 |
+
```bibtex
|
| 29 |
+
@inproceedings{hajimiri2026locatvit,
|
| 30 |
+
author = {Hajimiri, Sina and Beizaee, Farzad and Shakeri, Fereshteh and Desrosiers, Christian and Ben Ayed, Ismail and Dolz, Jose},
|
| 31 |
+
title = {Locality-Attending Vision Transformer},
|
| 32 |
+
booktitle = {International Conference on Learning Representations},
|
| 33 |
+
year = {2026}
|
| 34 |
+
}
|
| 35 |
+
```
|