sinahmr
/

locatvit_base

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions

locatvit_base / README.md

sinahmr's picture

Update README.md

cdbee54 verified 30 days ago

|

history blame contribute delete

1.01 kB

	---
	tags:
	- model_hub_mixin
	- pytorch_model_hub_mixin
	license: mit
	datasets:
	- ILSVRC/imagenet-1k
	library_name: timm
	---

	# LocAtViT: Locality-Attending Vision Transformer

	[![arXiv](https://img.shields.io/badge/arXiv-2603.04892-b31b1b.svg)](https://arxiv.org/abs/2603.04892)
	[![hfpaper](https://img.shields.io/badge/GitHub-LocAtViT-black)](https://github.com/sinahmr/LocAtViT)

	> Pretrain vision transformers so that their patch representations transfer better to dense prediction (e.g., segmentation), without changing the pretraining objective.


	## Usage

	```python
	import timm
	model = timm.create_model("hf_hub:sinahmr/locatvit_base", pretrained=True)
	```

	## Citation

	```bibtex
	@inproceedings{hajimiri2026locatvit,
	author = {Hajimiri, Sina and Beizaee, Farzad and Shakeri, Fereshteh and Desrosiers, Christian and Ben Ayed, Ismail and Dolz, Jose},
	title = {Locality-Attending Vision Transformer},
	booktitle = {International Conference on Learning Representations},
	year = {2026}
	}
	```