File size: 6,676 Bytes
f25f7c9 a3a77b3 66de45f f25f7c9 a3a77b3 f25f7c9 dc8eee6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
---
title: DINOv3 Web/Sat Interactive Similarity
emoji: 🦖
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false
license: mit
short_description: Visualize image patch similarity like in DINOv3 presentation
---
# DINOv3 Patch Similarity Viewer [Github Repo](https://github.com/devMuniz02/DINOv3-Interactive-Patch-Cosine-Similarity)

> **Note:** This README and repository are for educational purposes. The creation of this repo was inspired by the DINOv3 paper to help visualize and understand the output of the model.
## Purpose
This repository provides interactive tools to visualize and explore patch-wise similarity in images using the DINOv3 vision transformer model. It is designed for researchers, students, and practitioners interested in understanding how self-supervised vision transformers perceive and relate different regions of an image.
## About DINOv3
- **Paper:** [DINOv3: Self-supervised Vision Transformers with Enormous Teacher Models](https://arxiv.org/abs/2508.10104)
- **Meta Research Page:** [Meta DINOv3 Publication](https://ai.meta.com/dinov3/)
- **Official GitHub:** [facebookresearch/dinov3](https://github.com/facebookresearch/dinov3)
**Note:**
The DINOv3 model weights require access approval.
You can request access via the [Meta Research page](https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/) or by selecting the desired model on [Hugging Face model collection](https://huggingface.co/collections/facebook/dinov3-68924841bd6b561778e31009).
## Features
- **Interactive Visualization:** Click on image patches or use arrow keys to explore patch similarity heatmaps.
- **Single or Two-Image Mode:** If one image is specified, shows self-similarity. If two images are specified, shows both self-similarity and cross-image similarity overlays interactively.
- **Image Preprocessing:** Loads and pads images without resizing, preserving the original aspect ratio.
- **Cosine Similarity Calculation:** Computes and visualizes cosine similarity between image patches.
- **Robust Fallback:** If an image URL fails to load, a default image is used.
## Installation
Install dependencies with:
```bash
pip install -r requirements.txt
```
## Model Selection
You can choose from several DINOv3 models available on Hugging Face (click to view each model card):
LVD-1689M Dataset (Web data)
- ViT
- [facebook/dinov3-vit7b16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vit7b16-pretrain-lvd1689m)
- [facebook/dinov3-vits16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m)
- [facebook/dinov3-vits16plus-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vits16plus-pretrain-lvd1689m)
- [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m)
- [facebook/dinov3-vitl16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m)
- [facebook/dinov3-vith16plus-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vith16plus-pretrain-lvd1689m)
- ConvNeXt
- [facebook/dinov3-convnext-tiny-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-convnext-tiny-pretrain-lvd1689m)
- [facebook/dinov3-convnext-small-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-convnext-small-pretrain-lvd1689m)
- [facebook/dinov3-convnext-base-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-convnext-base-pretrain-lvd1689m)
- [facebook/dinov3-convnext-large-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-convnext-large-pretrain-lvd1689m)
SAT-493M Dataset (Satellite data)
- ViT
- [facebook/dinov3-vitl16-pretrain-sat493m](https://huggingface.co/facebook/dinov3-vitl16-pretrain-sat493m)
- [facebook/dinov3-vit7b16-pretrain-sat493m](https://huggingface.co/facebook/dinov3-vit7b16-pretrain-sat493m)
## Usage
### Gradio app
Run the Gradio app:
```bash
python app.py
```
After runnig the app, go to [http://localhost:7860/](http://localhost:7860/) to see the app running.
Then:
- Choose Dataset and model name
- For Single image similarity:
- Choose only one file or URL
- For 2 image similarity:
- Choose images from file and/or URL
- Click button "Initialize / Update "
- Select the desired patch from the image
- Watch the results
**Note:**
*Overlay alpha* is the intensity of the overlay of patches on top of image
### Python Script
Run the interactive viewer with the default COCO image:
```bash
python DINOv3CosSimilarity.py
```
#### Single Image Mode
Specify your own image (local path or URL):
```bash
python DINOv3CosSimilarity.py --image path/to/your/image.jpg
python DINOv3CosSimilarity.py --image https://yourdomain.com/image.png
```
#### Two Image Mode
Specify two images (local paths or URLs):
```bash
python DINOv3CosSimilarity.py --image1 path/to/image1.jpg --image2 path/to/image2.jpg
python DINOv3CosSimilarity.py --image1 https://yourdomain.com/image1.png --image2 https://yourdomain.com/image2.png
```
#### Model Selection
Specify the model with `--model` (default is vits16):
```bash
python DINOv3CosSimilarity.py --model facebook/dinov3-vitb16-pretrain-lvd1689m
```
#### Other Options
- `--show_grid` : Draw patch grid
- `--annotate_indices` : Write patch indices on cells
- `--overlay_alpha <float>` : Set heatmap alpha (default 0.55)
- `--patch_size <int>` : Override patch size (default: model's patch size)
#### Controls
- Mouse click to select a patch
- Arrow keys to move selection
- '1', '2', or 't' to switch active image (in two-image mode)
- 'q' to quit
## Demo Single Image

## Demo 2 Images

### Jupyter Notebook
1. Open `PatchCosSimilarity.ipynb` in Jupyter Notebook.
2. Run the cells to load an image and visualize patch similarities.
3. Set `url1` for single-image mode, or both `url1` and `url2` for two-image mode.
4. If an image fails to load, a default image will be used automatically.
5. Set the `model_id` variable to any of the models listed above (see commented lines at the top of the notebook).
**Notebook Controls:**
- Mouse click to select a patch
- Arrow keys to move selection
- '1', '2', or 't' to switch active image (in two-image mode)
## License
This project is licensed under the MIT License. See the `LICENSE` file for details.
## Acknowledgments
This project utilizes the DINOv3 model from Hugging Face's Transformers library, along with PyTorch, Matplotlib, and Pillow
|