Amazon-FAR
/

deltatok-kinetics

@@ -1,19 +1,28 @@
 ---
 library_name: pytorch
-tags:
-  - deltatok
 license: apache-2.0
-datasets:
-  - kinetics700
 ---
 # DeltaTok (Tokenizer) — Kinetics-700
-ViT-B encoder and decoder that compresses consecutive video frame features into a single continuous delta token. Trained on Kinetics-700 at 512x512 resolution. Requires a frozen [DINOv3](https://github.com/facebookresearch/dinov3) ViT-B backbone (not included).
 ## Usage
-See the [DeltaTok GitHub repository](https://github.com/amazon-far/deltatok) for training and evaluation code.
 ## Acknowledgements
@@ -29,4 +38,4 @@ See the [DeltaTok GitHub repository](https://github.com/amazon-far/deltatok) for
   booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
   year      = {2026}
 }
-```

 ---
+datasets:
+- kinetics700
 library_name: pytorch
 license: apache-2.0
+pipeline_tag: image-feature-extraction
+tags:
+- deltatok
 ---
 # DeltaTok (Tokenizer) — Kinetics-700
+This repository contains the DeltaTok weights as presented in the paper [A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens](https://huggingface.co/papers/2604.04913) (CVPR 2026).
+[**Project Page**](https://deltatok.github.io) | [**GitHub**](https://github.com/amazon-far/deltatok)
+DeltaTok is a video tokenizer that encodes the vision foundation model (VFM) feature differences between consecutive frames into a single continuous "delta" token. This approach significantly reduces the token count in video sequences (e.g., 1,024x reduction) while enabling efficient generative world modeling.
+## Model Description
+This repository contains the ViT-B encoder and decoder trained on Kinetics-700 at 512x512 resolution. The model is designed to work with a frozen [DINOv3](https://github.com/facebookresearch/dinov3) ViT-B backbone (not included).
 ## Usage
+Please refer to the [DeltaTok GitHub repository](https://github.com/amazon-far/deltatok) for setup, training, and evaluation instructions.
 ## Acknowledgements
   booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
   year      = {2026}
 }
+```