visreg / README.md

Update README.md

06d3381 verified 2 days ago

3.86 kB

	---
	license: cc-by-nc-4.0
	library_name: timm
	tags:
	- vision
	- self-supervised-learning
	- image-classification
	- feature-extraction
	- vit
	datasets:
	- ILSVRC/imagenet-1k
	pipeline_tag: image-feature-extraction
	---

	<h1 style="font-size: 2.5em; text-align: center;">VISReg: Variance-Invariance-Sketching Regularization for JEPA training</h1>

	<p align="center">
	<a href="https://arxiv.org/abs/2606.02572v1"><img src="https://img.shields.io/badge/arXiv-2606.02572-b31b1b.svg" alt="arXiv"></a>
	<a href="https://haiyuwu.github.io/visreg/"><img src="https://img.shields.io/badge/Project-Page-blue" alt="Project Page"></a>
	<a href="https://github.com/HaiyuWu/visreg"><img src="https://img.shields.io/badge/GitHub-Code-black?logo=github" alt="GitHub"></a>
	</p>

	Key results:
	- 💪 Strong collapse prevention: High gradient when embedding collapse
	- ⚡ Friendly to scale training: Linear complexity to scaling factors
	- 🧩 Easy to train: Similar to LeJEPA, it is a heuristic-free method
	- 🏆 Best OOD performance: Achieve the best accuracy on 6 OOD datasets
	- 📉 Data efficiency: Achieving a similar average accuracy to DINOv2 with 90% less data
	- 🧬 Robust to low-quality datasets: It is robust to long-tailed and sparse datasets

	<h2 style="font-size: 1.8em;">Available Checkpoints</h2>

	\| File \| Architecture \| Patch Size \| Embed Dim \| Params \| Pre-training Data \|
	\|------\|-------------\|------------\|-----------\|--------\|-------------------\|
	\| `visreg-vit-b-inet1k.pth` \| ViT-Base \| 16 \| 768 \| 86M \| ImageNet-1K \|
	\| `visreg-vit-l-inet1k.pth` \| ViT-Large \| 14 \| 1024 \| 304M \| ImageNet-1K \|

	<h2 style="font-size: 1.8em;">Usage</h2>

	<h3 style="font-size: 1.4em;">Load with timm</h3>

	```python
	import timm
	import torch

	# ViT-Base/16
	model = timm.create_model("vit_base_patch16_224", pretrained=False, num_classes=0, dynamic_img_size=True)
	state_dict = torch.load("visreg-vit-b-inet1k.pth", map_location="cpu")
	model.load_state_dict(state_dict)

	# ViT-Large/14
	model = timm.create_model("vit_large_patch14_224", pretrained=False, num_classes=0, dynamic_img_size=True)
	state_dict = torch.load("visreg-vit-l-inet1k.pth", map_location="cpu")
	model.load_state_dict(state_dict)
	```

	<h3 style="font-size: 1.4em;">Download with huggingface_hub</h3>

	```python
	from huggingface_hub import hf_hub_download

	# ViT-Base/16
	path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-b-inet1k.pth")

	# ViT-Large/14
	path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-l-inet1k.pth")
	```

	<h3 style="font-size: 1.4em;">Feature extraction</h3>

	```python
	from PIL import Image
	from torchvision import transforms

	transform = transforms.Compose([
	transforms.Resize(256),
	transforms.CenterCrop(224),
	transforms.ToTensor(),
	transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
	])

	img = transform(Image.open("image.jpg")).unsqueeze(0)

	with torch.no_grad():
	features = model(img) # [1, embed_dim]
	```

	<h2 style="font-size: 1.8em;">Evaluation</h2>

	Full evaluation suite (linear probe, segmentation, fine-tuning) is available in the [GitHub repo](https://github.com/HaiyuWu/visreg):

	```bash
	# Linear probe on 10+ datasets
	python downstream/linear_prob/run_evaluation.py \
	--checkpoint visreg-vit-b-inet1k.pth \
	--model vit_b \
	--datasets all
	```

	<h2 style="font-size: 1.8em;">Citation</h2>

	```bibtex
	@inproceedings{wu2026visreg,
	title = {VISReg: Variance-Invariance-Sketching Regularization for JEPA training},
	author = {Wu, Haiyu and Balestriero, Randall and Levine, Morgan},
	booktitle = {arXiv},
	year = {2026}
	}
	```

	<h2 style="font-size: 1.8em;">License</h2>

	This project (code and pretrained weights) is released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) for non-commercial use only.