| | --- |
| | tags: |
| | - model_hub_mixin |
| | - pytorch_model_hub_mixin |
| | license: mit |
| | datasets: |
| | - ILSVRC/imagenet-1k |
| | library_name: timm |
| | --- |
| | |
| | # LocAtViT: Locality-Attending Vision Transformer |
| |
|
| | [](https://arxiv.org/abs/2603.04892) |
| | [](https://github.com/sinahmr/LocAtViT) |
| |
|
| | > Pretrain vision transformers so that their patch representations transfer better to dense prediction (e.g., segmentation), without changing the pretraining objective. |
| |
|
| |
|
| | ## Usage |
| |
|
| | ```python |
| | import timm |
| | model = timm.create_model("hf_hub:sinahmr/locatvit_tiny", pretrained=True) |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @inproceedings{hajimiri2026locatvit, |
| | author = {Hajimiri, Sina and Beizaee, Farzad and Shakeri, Fereshteh and Desrosiers, Christian and Ben Ayed, Ismail and Dolz, Jose}, |
| | title = {Locality-Attending Vision Transformer}, |
| | booktitle = {International Conference on Learning Representations}, |
| | year = {2026} |
| | } |
| | ``` |
| |
|