Image Feature Extraction
timm
vision
self-supervised-learning
image-classification
feature-extraction
vit
Instructions to use BooBooWu/visreg with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- timm
How to use BooBooWu/visreg with timm:
import timm model = timm.create_model("hf_hub:BooBooWu/visreg", pretrained=True) - Notebooks
- Google Colab
- Kaggle
| license: cc-by-nc-4.0 | |
| library_name: timm | |
| tags: | |
| - vision | |
| - self-supervised-learning | |
| - image-classification | |
| - feature-extraction | |
| - vit | |
| datasets: | |
| - ILSVRC/imagenet-1k | |
| pipeline_tag: image-feature-extraction | |
| <h1 style="font-size: 2.5em; text-align: center;">VISReg: Variance-Invariance-Sketching Regularization for JEPA training</h1> | |
| <p align="center"> | |
| <a href="https://arxiv.org/abs/2606.02572v1"><img src="https://img.shields.io/badge/arXiv-2606.02572-b31b1b.svg" alt="arXiv"></a> | |
| <a href="https://haiyuwu.github.io/visreg/"><img src="https://img.shields.io/badge/Project-Page-blue" alt="Project Page"></a> | |
| <a href="https://github.com/HaiyuWu/visreg"><img src="https://img.shields.io/badge/GitHub-Code-black?logo=github" alt="GitHub"></a> | |
| </p> | |
| **Key results:** | |
| - πͺ **Strong collapse prevention**: High gradient when embedding collapse | |
| - β‘ **Friendly to scale training**: Linear complexity to scaling factors | |
| - π§© **Easy to train**: Similar to LeJEPA, it is a heuristic-free method | |
| - π **Best OOD performance**: Achieve the best accuracy on 6 OOD datasets | |
| - π **Data efficiency**: Achieving a similar average accuracy to DINOv2 with 90% less data | |
| - 𧬠**Robust to low-quality datasets**: It is robust to long-tailed and sparse datasets | |
| <h2 style="font-size: 1.8em;">Available Checkpoints</h2> | |
| | File | Architecture | Patch Size | Embed Dim | Params | Pre-training Data | | |
| |------|-------------|------------|-----------|--------|-------------------| | |
| | `visreg-vit-b-inet1k.pth` | ViT-Base | 16 | 768 | 86M | ImageNet-1K | | |
| | `visreg-vit-l-inet1k.pth` | ViT-Large | 14 | 1024 | 304M | ImageNet-1K | | |
| <h2 style="font-size: 1.8em;">Usage</h2> | |
| <h3 style="font-size: 1.4em;">Load with timm</h3> | |
| ```python | |
| import timm | |
| import torch | |
| # ViT-Base/16 | |
| model = timm.create_model("vit_base_patch16_224", pretrained=False, num_classes=0, dynamic_img_size=True) | |
| state_dict = torch.load("visreg-vit-b-inet1k.pth", map_location="cpu") | |
| model.load_state_dict(state_dict) | |
| # ViT-Large/14 | |
| model = timm.create_model("vit_large_patch14_224", pretrained=False, num_classes=0, dynamic_img_size=True) | |
| state_dict = torch.load("visreg-vit-l-inet1k.pth", map_location="cpu") | |
| model.load_state_dict(state_dict) | |
| ``` | |
| <h3 style="font-size: 1.4em;">Download with huggingface_hub</h3> | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| # ViT-Base/16 | |
| path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-b-inet1k.pth") | |
| # ViT-Large/14 | |
| path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-l-inet1k.pth") | |
| ``` | |
| <h3 style="font-size: 1.4em;">Feature extraction</h3> | |
| ```python | |
| from PIL import Image | |
| from torchvision import transforms | |
| transform = transforms.Compose([ | |
| transforms.Resize(256), | |
| transforms.CenterCrop(224), | |
| transforms.ToTensor(), | |
| transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), | |
| ]) | |
| img = transform(Image.open("image.jpg")).unsqueeze(0) | |
| with torch.no_grad(): | |
| features = model(img) # [1, embed_dim] | |
| ``` | |
| <h2 style="font-size: 1.8em;">Evaluation</h2> | |
| Full evaluation suite (linear probe, segmentation, fine-tuning) is available in the [GitHub repo](https://github.com/HaiyuWu/visreg): | |
| ```bash | |
| # Linear probe on 10+ datasets | |
| python downstream/linear_prob/run_evaluation.py \ | |
| --checkpoint visreg-vit-b-inet1k.pth \ | |
| --model vit_b \ | |
| --datasets all | |
| ``` | |
| <h2 style="font-size: 1.8em;">Citation</h2> | |
| ```bibtex | |
| @inproceedings{wu2026visreg, | |
| title = {VISReg: Variance-Invariance-Sketching Regularization for JEPA training}, | |
| author = {Wu, Haiyu and Balestriero, Randall and Levine, Morgan}, | |
| booktitle = {arXiv}, | |
| year = {2026} | |
| } | |
| ``` | |
| <h2 style="font-size: 1.8em;">License</h2> | |
| This project (code and pretrained weights) is released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) for non-commercial use only. | |