YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Procedural Warm-Up Vision Transformers

Pretrained Vision Transformers initialized using procedural warm-up, as introduced in:

Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers (CVPR 2026) https://arxiv.org/abs/2511.13945

Models

  • pw-vit-t — ViT-Tiny
  • pw-vit-b — ViT-Base

These models are trained using procedural data (e.g. Dyck sequences) and are intended as initialization checkpoints for downstream visual tasks.


Key Result

On ImageNet-1k, allocating just 1% of training to procedural data improves final accuracy by +1.7% for ViT-Base.


Usage

import torch
ckpt = torch.load("pw-vit-b/model.pth", map_location="cpu")
model.load_state_dict(ckpt)

Citation

If you find this work useful, please cite our paper:

@inproceedings{shinnick2026proceduralwarmup,
  title={Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers},
  author={Shinnick, Zachary and Jiang, Liangze and Saratchandran, Hemanth and Teney, Damien and van den Hengel, Anton},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for zlshinnick/procedural-warmup-vit