| --- |
| tags: |
| - model_hub_mixin |
| - pytorch_model_hub_mixin |
| license: mit |
| datasets: |
| - ILSVRC/imagenet-1k |
| library_name: timm |
| --- |
| |
| # LocAtViT: Locality-Attending Vision Transformer |
|
|
| [](https://arxiv.org/abs/2603.04892) |
| [](https://github.com/sinahmr/LocAtViT) |
|
|
| > Pretrain vision transformers so that their patch representations transfer better to dense prediction (e.g., segmentation), without changing the pretraining objective. |
|
|
|
|
| ## Usage |
|
|
| ```python |
| import timm |
| model = timm.create_model("hf_hub:sinahmr/locatvit_base", pretrained=True) |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{hajimiri2026locatvit, |
| author = {Hajimiri, Sina and Beizaee, Farzad and Shakeri, Fereshteh and Desrosiers, Christian and Ben Ayed, Ismail and Dolz, Jose}, |
| title = {Locality-Attending Vision Transformer}, |
| booktitle = {International Conference on Learning Representations}, |
| year = {2026} |
| } |
| ``` |
|
|