Krispin
/

vit-up

Add pipeline tag and sample usage

by nielsr HF Staff - opened about 16 hours ago

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 license: cc-by-nc-sa-4.0
-tags:
-- arxiv:2606.14024
 ---
 # ViT-Up
@@ -10,18 +9,45 @@ tags:
 This repository provides pretrained ViT-Up weights for DINOv3-S+ and DINOv3-B.
-- Paper: https://arxiv.org/abs/2606.14024
-- HF Paper page: https://huggingface.co/papers/2606.14024
-- Project page: https://vitup.papers.discuna.com/
-- Code: https://github.com/krispinwandel/vit-up
 ## Citation
 ```bibtex
 @misc{wandel2026vitupfaithfulfeatureupsampling,
       title={ViT-Up: Faithful Feature Upsampling for Vision Transformers},
-      author={Krispin Wandel and Jingchuan Wang and Hesheng Wang},
       year={2026},
       eprint={2606.14024},
       archivePrefix={arXiv},

 ---
 license: cc-by-nc-sa-4.0
+pipeline_tag: image-feature-extraction
 ---
 # ViT-Up
 This repository provides pretrained ViT-Up weights for DINOv3-S+ and DINOv3-B.
+- **Paper**: [ViT-Up: Faithful Feature Upsampling for Vision Transformers](https://huggingface.co/papers/2606.14024)
+- **Project page**: https://vitup.papers.discuna.com/
+- **Code**: https://github.com/krispinwandel/vit-up
+## Sample Usage
+ViT-Up models can be loaded directly with `torch.hub.load`. The Hub entry points download ViT-Up weights from Hugging Face and load the matching DINOv3 backbone.
+```python
+import torch
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# Available entry points:
+# - vit_up_dinov3_splus
+# - vit_up_dinov3_base
+model = torch.hub.load(
+    "krispinwandel/vit-up",
+    "vit_up_dinov3_splus",
+    pretrained=True,
+    trust_repo=True,
+    device=device,
+).eval()
+images = torch.randn(1, 3, 448, 448, device=device)
+query_coords = torch.rand(1, 100, 2, device=device)  # normalized (x, y) in [0, 1]
+with torch.no_grad():
+    features = model(images, query_coords)
+print(features.shape)  # (B, N_queries, D)
+```
 ## Citation
 ```bibtex
 @misc{wandel2026vitupfaithfulfeatureupsampling,
       title={ViT-Up: Faithful Feature Upsampling for Vision Transformers},
+      author={Krispin Wandel evangelista and Jingchuan Wang and Hesheng Wang},
       year={2026},
       eprint={2606.14024},
       archivePrefix={arXiv},