Image Feature Extraction

Add pipeline tag and sample usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +33 -7
README.md CHANGED
@@ -1,7 +1,6 @@
1
  ---
2
  license: cc-by-nc-sa-4.0
3
- tags:
4
- - arxiv:2606.14024
5
  ---
6
 
7
  # ViT-Up
@@ -10,18 +9,45 @@ tags:
10
 
11
  This repository provides pretrained ViT-Up weights for DINOv3-S+ and DINOv3-B.
12
 
13
- - Paper: https://arxiv.org/abs/2606.14024
14
- - HF Paper page: https://huggingface.co/papers/2606.14024
15
- - Project page: https://vitup.papers.discuna.com/
16
- - Code: https://github.com/krispinwandel/vit-up
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Citation
20
 
21
  ```bibtex
22
  @misc{wandel2026vitupfaithfulfeatureupsampling,
23
  title={ViT-Up: Faithful Feature Upsampling for Vision Transformers},
24
- author={Krispin Wandel and Jingchuan Wang and Hesheng Wang},
25
  year={2026},
26
  eprint={2606.14024},
27
  archivePrefix={arXiv},
 
1
  ---
2
  license: cc-by-nc-sa-4.0
3
+ pipeline_tag: image-feature-extraction
 
4
  ---
5
 
6
  # ViT-Up
 
9
 
10
  This repository provides pretrained ViT-Up weights for DINOv3-S+ and DINOv3-B.
11
 
12
+ - **Paper**: [ViT-Up: Faithful Feature Upsampling for Vision Transformers](https://huggingface.co/papers/2606.14024)
13
+ - **Project page**: https://vitup.papers.discuna.com/
14
+ - **Code**: https://github.com/krispinwandel/vit-up
 
15
 
16
+ ## Sample Usage
17
+
18
+ ViT-Up models can be loaded directly with `torch.hub.load`. The Hub entry points download ViT-Up weights from Hugging Face and load the matching DINOv3 backbone.
19
+
20
+ ```python
21
+ import torch
22
+
23
+ device = "cuda" if torch.cuda.is_available() else "cpu"
24
+
25
+ # Available entry points:
26
+ # - vit_up_dinov3_splus
27
+ # - vit_up_dinov3_base
28
+ model = torch.hub.load(
29
+ "krispinwandel/vit-up",
30
+ "vit_up_dinov3_splus",
31
+ pretrained=True,
32
+ trust_repo=True,
33
+ device=device,
34
+ ).eval()
35
+
36
+ images = torch.randn(1, 3, 448, 448, device=device)
37
+ query_coords = torch.rand(1, 100, 2, device=device) # normalized (x, y) in [0, 1]
38
+
39
+ with torch.no_grad():
40
+ features = model(images, query_coords)
41
+
42
+ print(features.shape) # (B, N_queries, D)
43
+ ```
44
 
45
  ## Citation
46
 
47
  ```bibtex
48
  @misc{wandel2026vitupfaithfulfeatureupsampling,
49
  title={ViT-Up: Faithful Feature Upsampling for Vision Transformers},
50
+ author={Krispin Wandel evangelista and Jingchuan Wang and Hesheng Wang},
51
  year={2026},
52
  eprint={2606.14024},
53
  archivePrefix={arXiv},