Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers
Paper • 2511.13945 • Published
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Pretrained Vision Transformers initialized using procedural warm-up, as introduced in:
Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers (CVPR 2026) https://arxiv.org/abs/2511.13945
These models are trained using procedural data (e.g. Dyck sequences) and are intended as initialization checkpoints for downstream visual tasks.
On ImageNet-1k, allocating just 1% of training to procedural data improves final accuracy by +1.7% for ViT-Base.
import torch
ckpt = torch.load("pw-vit-b/model.pth", map_location="cpu")
model.load_state_dict(ckpt)
If you find this work useful, please cite our paper:
@inproceedings{shinnick2026proceduralwarmup,
title={Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers},
author={Shinnick, Zachary and Jiang, Liangze and Saratchandran, Hemanth and Teney, Damien and van den Hengel, Anton},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026},
}