Image Feature Extraction
timm
vision
self-supervised-learning
image-classification
feature-extraction
vit
Instructions to use BooBooWu/visreg with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- timm
How to use BooBooWu/visreg with timm:
import timm model = timm.create_model("hf_hub:BooBooWu/visreg", pretrained=True) - Notebooks
- Google Colab
- Kaggle
File size: 3,863 Bytes
0d5f4fd 06d3381 0d5f4fd 69e49e4 0d5f4fd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | ---
license: cc-by-nc-4.0
library_name: timm
tags:
- vision
- self-supervised-learning
- image-classification
- feature-extraction
- vit
datasets:
- ILSVRC/imagenet-1k
pipeline_tag: image-feature-extraction
---
<h1 style="font-size: 2.5em; text-align: center;">VISReg: Variance-Invariance-Sketching Regularization for JEPA training</h1>
<p align="center">
<a href="https://arxiv.org/abs/2606.02572v1"><img src="https://img.shields.io/badge/arXiv-2606.02572-b31b1b.svg" alt="arXiv"></a>
<a href="https://haiyuwu.github.io/visreg/"><img src="https://img.shields.io/badge/Project-Page-blue" alt="Project Page"></a>
<a href="https://github.com/HaiyuWu/visreg"><img src="https://img.shields.io/badge/GitHub-Code-black?logo=github" alt="GitHub"></a>
</p>
**Key results:**
- 💪 **Strong collapse prevention**: High gradient when embedding collapse
- ⚡ **Friendly to scale training**: Linear complexity to scaling factors
- 🧩 **Easy to train**: Similar to LeJEPA, it is a heuristic-free method
- 🏆 **Best OOD performance**: Achieve the best accuracy on 6 OOD datasets
- 📉 **Data efficiency**: Achieving a similar average accuracy to DINOv2 with 90% less data
- 🧬 **Robust to low-quality datasets**: It is robust to long-tailed and sparse datasets
<h2 style="font-size: 1.8em;">Available Checkpoints</h2>
| File | Architecture | Patch Size | Embed Dim | Params | Pre-training Data |
|------|-------------|------------|-----------|--------|-------------------|
| `visreg-vit-b-inet1k.pth` | ViT-Base | 16 | 768 | 86M | ImageNet-1K |
| `visreg-vit-l-inet1k.pth` | ViT-Large | 14 | 1024 | 304M | ImageNet-1K |
<h2 style="font-size: 1.8em;">Usage</h2>
<h3 style="font-size: 1.4em;">Load with timm</h3>
```python
import timm
import torch
# ViT-Base/16
model = timm.create_model("vit_base_patch16_224", pretrained=False, num_classes=0, dynamic_img_size=True)
state_dict = torch.load("visreg-vit-b-inet1k.pth", map_location="cpu")
model.load_state_dict(state_dict)
# ViT-Large/14
model = timm.create_model("vit_large_patch14_224", pretrained=False, num_classes=0, dynamic_img_size=True)
state_dict = torch.load("visreg-vit-l-inet1k.pth", map_location="cpu")
model.load_state_dict(state_dict)
```
<h3 style="font-size: 1.4em;">Download with huggingface_hub</h3>
```python
from huggingface_hub import hf_hub_download
# ViT-Base/16
path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-b-inet1k.pth")
# ViT-Large/14
path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-l-inet1k.pth")
```
<h3 style="font-size: 1.4em;">Feature extraction</h3>
```python
from PIL import Image
from torchvision import transforms
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
img = transform(Image.open("image.jpg")).unsqueeze(0)
with torch.no_grad():
features = model(img) # [1, embed_dim]
```
<h2 style="font-size: 1.8em;">Evaluation</h2>
Full evaluation suite (linear probe, segmentation, fine-tuning) is available in the [GitHub repo](https://github.com/HaiyuWu/visreg):
```bash
# Linear probe on 10+ datasets
python downstream/linear_prob/run_evaluation.py \
--checkpoint visreg-vit-b-inet1k.pth \
--model vit_b \
--datasets all
```
<h2 style="font-size: 1.8em;">Citation</h2>
```bibtex
@inproceedings{wu2026visreg,
title = {VISReg: Variance-Invariance-Sketching Regularization for JEPA training},
author = {Wu, Haiyu and Balestriero, Randall and Levine, Morgan},
booktitle = {arXiv},
year = {2026}
}
```
<h2 style="font-size: 1.8em;">License</h2>
This project (code and pretrained weights) is released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) for non-commercial use only.
|