--- license: cc-by-nc-4.0 library_name: timm tags: - vision - self-supervised-learning - image-classification - feature-extraction - vit datasets: - ILSVRC/imagenet-1k pipeline_tag: image-feature-extraction ---

VISReg: Variance-Invariance-Sketching Regularization for JEPA training

arXiv Project Page GitHub

**Key results:** - 💪 **Strong collapse prevention**: High gradient when embedding collapse - ⚡ **Friendly to scale training**: Linear complexity to scaling factors - 🧩 **Easy to train**: Similar to LeJEPA, it is a heuristic-free method - 🏆 **Best OOD performance**: Achieve the best accuracy on 6 OOD datasets - 📉 **Data efficiency**: Achieving a similar average accuracy to DINOv2 with 90% less data - 🧬 **Robust to low-quality datasets**: It is robust to long-tailed and sparse datasets

Available Checkpoints

| File | Architecture | Patch Size | Embed Dim | Params | Pre-training Data | |------|-------------|------------|-----------|--------|-------------------| | `visreg-vit-b-inet1k.pth` | ViT-Base | 16 | 768 | 86M | ImageNet-1K | | `visreg-vit-l-inet1k.pth` | ViT-Large | 14 | 1024 | 304M | ImageNet-1K |

Usage

Load with timm

```python import timm import torch # ViT-Base/16 model = timm.create_model("vit_base_patch16_224", pretrained=False, num_classes=0, dynamic_img_size=True) state_dict = torch.load("visreg-vit-b-inet1k.pth", map_location="cpu") model.load_state_dict(state_dict) # ViT-Large/14 model = timm.create_model("vit_large_patch14_224", pretrained=False, num_classes=0, dynamic_img_size=True) state_dict = torch.load("visreg-vit-l-inet1k.pth", map_location="cpu") model.load_state_dict(state_dict) ```

Download with huggingface_hub

```python from huggingface_hub import hf_hub_download # ViT-Base/16 path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-b-inet1k.pth") # ViT-Large/14 path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-l-inet1k.pth") ```

Feature extraction

```python from PIL import Image from torchvision import transforms transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) img = transform(Image.open("image.jpg")).unsqueeze(0) with torch.no_grad(): features = model(img) # [1, embed_dim] ```

Evaluation

Full evaluation suite (linear probe, segmentation, fine-tuning) is available in the [GitHub repo](https://github.com/HaiyuWu/visreg): ```bash # Linear probe on 10+ datasets python downstream/linear_prob/run_evaluation.py \ --checkpoint visreg-vit-b-inet1k.pth \ --model vit_b \ --datasets all ```

Citation

```bibtex @inproceedings{wu2026visreg, title = {VISReg: Variance-Invariance-Sketching Regularization for JEPA training}, author = {Wu, Haiyu and Balestriero, Randall and Levine, Morgan}, booktitle = {arXiv}, year = {2026} } ```

License

This project (code and pretrained weights) is released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) for non-commercial use only.