--- tags: - model_hub_mixin - pytorch_model_hub_mixin license: mit datasets: - ILSVRC/imagenet-1k library_name: timm --- # LocAtViT: Locality-Attending Vision Transformer [![arXiv](https://img.shields.io/badge/arXiv-2603.04892-b31b1b.svg)](https://arxiv.org/abs/2603.04892) [![hfpaper](https://img.shields.io/badge/GitHub-LocAtViT-black)](https://github.com/sinahmr/LocAtViT) > Pretrain vision transformers so that their patch representations transfer better to dense prediction (e.g., segmentation), without changing the pretraining objective. ## Usage ```python import timm model = timm.create_model("hf_hub:sinahmr/locatvit_tiny", pretrained=True) ``` ## Citation ```bibtex @inproceedings{hajimiri2026locatvit, author = {Hajimiri, Sina and Beizaee, Farzad and Shakeri, Fereshteh and Desrosiers, Christian and Ben Ayed, Ismail and Dolz, Jose}, title = {Locality-Attending Vision Transformer}, booktitle = {International Conference on Learning Representations}, year = {2026} } ```