RGB Head — ImageNet

RGB reconstruction head based on RAE, trained on ImageNet. Part of A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens (CVPR 2026 Highlight).

Usage

Requires a frozen DINOv3 ViT-B backbone. See the DeltaTok GitHub repository for training and evaluation code.

Acknowledgements

Citation

@inproceedings{kerssies2026deltatok,
  title     = {A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens},
  author    = {Kerssies, Tommie and Berton, Gabriele and He, Ju and Yu, Qihang and Ma, Wufei and de Geus, Daan and Dubbelman, Gijs and Chen, Liang-Chieh},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train Amazon-FAR/rgb-head-imagenet

Collection including Amazon-FAR/rgb-head-imagenet

DeltaTok

Collection

DeltaTok tokenizer, DeltaWorld predictor, and evaluation heads. https://github.com/amazon-far/deltatok • 7 items • Updated Apr 8 • 8

Paper for Amazon-FAR/rgb-head-imagenet

A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens

Paper • 2604.04913 • Published Apr 6 • 12