File size: 6,676 Bytes
f25f7c9
a3a77b3
66de45f
f25f7c9
 
 
 
 
 
 
a3a77b3
f25f7c9
 
dc8eee6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
title: DINOv3 Web/Sat Interactive Similarity
emoji: 🦖
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false
license: mit
short_description: Visualize image patch similarity like in DINOv3 presentation
---

# DINOv3 Patch Similarity Viewer [Github Repo](https://github.com/devMuniz02/DINOv3-Interactive-Patch-Cosine-Similarity) 

![Gradio Test app](assets/GradioAppTest.gif)

> **Note:** This README and repository are for educational purposes. The creation of this repo was inspired by the DINOv3 paper to help visualize and understand the output of the model.

## Purpose

This repository provides interactive tools to visualize and explore patch-wise similarity in images using the DINOv3 vision transformer model. It is designed for researchers, students, and practitioners interested in understanding how self-supervised vision transformers perceive and relate different regions of an image.

## About DINOv3

- **Paper:** [DINOv3: Self-supervised Vision Transformers with Enormous Teacher Models](https://arxiv.org/abs/2508.10104)
- **Meta Research Page:** [Meta DINOv3 Publication](https://ai.meta.com/dinov3/)
- **Official GitHub:** [facebookresearch/dinov3](https://github.com/facebookresearch/dinov3)

**Note:**  
The DINOv3 model weights require access approval.  
You can request access via the [Meta Research page](https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/) or by selecting the desired model on [Hugging Face model collection](https://huggingface.co/collections/facebook/dinov3-68924841bd6b561778e31009).

## Features

- **Interactive Visualization:** Click on image patches or use arrow keys to explore patch similarity heatmaps.
- **Single or Two-Image Mode:** If one image is specified, shows self-similarity. If two images are specified, shows both self-similarity and cross-image similarity overlays interactively.
- **Image Preprocessing:** Loads and pads images without resizing, preserving the original aspect ratio.
- **Cosine Similarity Calculation:** Computes and visualizes cosine similarity between image patches.
- **Robust Fallback:** If an image URL fails to load, a default image is used.

## Installation

Install dependencies with:

```bash
pip install -r requirements.txt
```

## Model Selection

You can choose from several DINOv3 models available on Hugging Face (click to view each model card):

LVD-1689M Dataset (Web data)
- ViT
    - [facebook/dinov3-vit7b16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vit7b16-pretrain-lvd1689m)
    - [facebook/dinov3-vits16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m)
    - [facebook/dinov3-vits16plus-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vits16plus-pretrain-lvd1689m)
    - [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m)
    - [facebook/dinov3-vitl16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m)
    - [facebook/dinov3-vith16plus-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vith16plus-pretrain-lvd1689m)

- ConvNeXt
    - [facebook/dinov3-convnext-tiny-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-convnext-tiny-pretrain-lvd1689m)
    - [facebook/dinov3-convnext-small-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-convnext-small-pretrain-lvd1689m)
    - [facebook/dinov3-convnext-base-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-convnext-base-pretrain-lvd1689m)
    - [facebook/dinov3-convnext-large-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-convnext-large-pretrain-lvd1689m)

SAT-493M Dataset (Satellite data)
- ViT
    - [facebook/dinov3-vitl16-pretrain-sat493m](https://huggingface.co/facebook/dinov3-vitl16-pretrain-sat493m)
    - [facebook/dinov3-vit7b16-pretrain-sat493m](https://huggingface.co/facebook/dinov3-vit7b16-pretrain-sat493m)

## Usage

### Gradio app

Run the Gradio app:

```bash
python app.py
```

After runnig the app, go to [http://localhost:7860/](http://localhost:7860/) to see the app running.

Then: 
- Choose Dataset and model name
- For Single image similarity:
    - Choose only one file or URL
- For 2 image similarity:
    - Choose images from file and/or URL
- Click button "Initialize / Update "
- Select the desired patch from the image
- Watch the results

**Note:** 
*Overlay alpha* is the intensity of the overlay of patches on top of image

### Python Script

Run the interactive viewer with the default COCO image:

```bash
python DINOv3CosSimilarity.py
```

#### Single Image Mode

Specify your own image (local path or URL):

```bash
python DINOv3CosSimilarity.py --image path/to/your/image.jpg
python DINOv3CosSimilarity.py --image https://yourdomain.com/image.png
```

#### Two Image Mode

Specify two images (local paths or URLs):

```bash
python DINOv3CosSimilarity.py --image1 path/to/image1.jpg --image2 path/to/image2.jpg
python DINOv3CosSimilarity.py --image1 https://yourdomain.com/image1.png --image2 https://yourdomain.com/image2.png
```

#### Model Selection

Specify the model with `--model` (default is vits16):

```bash
python DINOv3CosSimilarity.py --model facebook/dinov3-vitb16-pretrain-lvd1689m
```

#### Other Options

- `--show_grid` : Draw patch grid
- `--annotate_indices` : Write patch indices on cells
- `--overlay_alpha <float>` : Set heatmap alpha (default 0.55)
- `--patch_size <int>` : Override patch size (default: model's patch size)

#### Controls

- Mouse click to select a patch
- Arrow keys to move selection
- '1', '2', or 't' to switch active image (in two-image mode)
- 'q' to quit

## Demo Single Image

![Interactive Patch Similarity Demo](assets/Test_Interactive_video.gif)

## Demo 2 Images

![Multiple Interactive Patch Similarity Demo](assets/Multiple_Interactive_test_video.gif)

### Jupyter Notebook

1. Open `PatchCosSimilarity.ipynb` in Jupyter Notebook.
2. Run the cells to load an image and visualize patch similarities.
3. Set `url1` for single-image mode, or both `url1` and `url2` for two-image mode.
4. If an image fails to load, a default image will be used automatically.
5. Set the `model_id` variable to any of the models listed above (see commented lines at the top of the notebook).

**Notebook Controls:**  
- Mouse click to select a patch  
- Arrow keys to move selection  
- '1', '2', or 't' to switch active image (in two-image mode)

## License

This project is licensed under the MIT License. See the `LICENSE` file for details.

## Acknowledgments

This project utilizes the DINOv3 model from Hugging Face's Transformers library, along with PyTorch, Matplotlib, and Pillow