metadata
license: cc-by-nc-sa-4.0
tags:
- arxiv:2606.14024
ViT-Up
ViT-Up is an implicit feature upsampler for Vision Transformers that predicts backbone-aligned features at arbitrary continuous image coordinates.
This repository provides pretrained ViT-Up weights for DINOv3-S+ and DINOv3-B.
- Paper: https://arxiv.org/abs/2606.14024
- HF Paper page: https://huggingface.co/papers/2606.14024
- Project page: https://vitup.papers.discuna.com/
- Code: https://github.com/krispinwandel/vit-up
Citation
@misc{wandel2026vitupfaithfulfeatureupsampling,
title={ViT-Up: Faithful Feature Upsampling for Vision Transformers},
author={Krispin Wandel and Jingchuan Wang and Hesheng Wang},
year={2026},
eprint={2606.14024},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.14024},
}