robotics-diffusion-transformer
/

RVQActionTokenizer

@@ -1,4 +1,6 @@
 ---
 tags:
 - RDT
 - rdt
@@ -7,12 +9,12 @@ tags:
 - discrete
 - vector-quantization
 - RDT 2
-license: apache-2.0
-pipeline_tag: robotics
 ---
 # RVQ-AT: Residual VQ Action Tokenizer for RDT 2
 **RVQ-AT** is a fast, compact **Residual Vector-Quantization** (RVQ) tokenizer for robot action streams.
 It converts continuous control trajectories into short sequences of **discrete action tokens** that plug directly into autoregressive VLA models.
@@ -29,10 +31,9 @@ Here, we provide:
 ---
 ## Using the Universal RVQ-AT Tokenizer
-We recommend chunking actions into \~**0.8 s windows** with fps = 30 and normalizing each action dimension using [normalizer](http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt) to **\[-1, 1]** before tokenization. Batched encode/decode are supported.
 ```python
 # Run under repository: https://github.com/thu-ml/RDT2
@@ -42,8 +43,8 @@ import numpy as np
 from models.normalizer import LinearNormalizer
 from vqvae.models.multivqvae import MultiVQVAE
-# Load from the Hub (replace with your repo id once published)
-vae = MultiVQVAE.from_pretrained("outputs/vqvae_hf").cuda().eval()
 normalizer = LinearNormalizer.load(
     "<Path_to_normalizer>"  # Download from:
     # http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt
@@ -72,7 +73,6 @@ tokens = vae.encode(nsample)  # or vae.encode(action_chunk)
 # Decode back to continuous actions
 recon_nsample = vae.decode(tokens)
 recon_action_chunk = normalizer["action"].unnormalize(recon_nsample)
 ```
 ---
@@ -85,16 +85,6 @@ Afterward, evaluate the reconstruction error on your data before using it for yo
 ---
-<!-- ## Performance (Universal Model)
-*(Representative, measured on internal eval — replace with your numbers when available.)*
-* **Compression:** 4 levels × 1 token/step → 4 tokens/step (often reduced further with temporal stride).
-* **Reconstruction:** MSE ↓ 25–40% vs. single-codebook VQ at equal bitrate.
-* **Latency:** <1 ms per 50×14 chunk on A100/PCIe; CPU-only real-time at 50 Hz feasible.
----
- -->
 ## Safety & Intended Use
 RVQ-AT is a representation learning component. **Do not** deploy decoded actions directly to hardware without:
@@ -110,11 +100,17 @@ RVQ-AT is a representation learning component. **Do not** deploy decoded actions
 If you use RVQ-AT in your work, please cite:
 ```bibtex
-@software{rdt2,
     title={RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data},
     author={RDT Team},
     url={https://github.com/thu-ml/RDT2},
-    month={September},
     year={2025}
 }
 ```
@@ -123,7 +119,7 @@ If you use RVQ-AT in your work, please cite:
 ## Contact
-* Issues & requests: open a GitHub issue (see [here](https://github.com/thu-ml/RDT2/blob/main/CONTRIBUTING.md) for guidelines) or start a Hub discussion on the model page.
 ---

 ---
+license: apache-2.0
+pipeline_tag: robotics
 tags:
 - RDT
 - rdt
 - discrete
 - vector-quantization
 - RDT 2
 ---
 # RVQ-AT: Residual VQ Action Tokenizer for RDT 2
+[**Project Page**](https://rdt-robotics.github.io/rdt2/) | [**Code**](https://github.com/thu-ml/RDT2) | [**Paper**](https://huggingface.co/papers/2602.03310)
 **RVQ-AT** is a fast, compact **Residual Vector-Quantization** (RVQ) tokenizer for robot action streams.
 It converts continuous control trajectories into short sequences of **discrete action tokens** that plug directly into autoregressive VLA models.
 ---
 ## Using the Universal RVQ-AT Tokenizer
+We recommend chunking actions into ~**0.8 s windows** with fps = 30 and normalizing each action dimension using [normalizer](http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt) to **[-1, 1]** before tokenization. Batched encode/decode are supported.
 ```python
 # Run under repository: https://github.com/thu-ml/RDT2
 from models.normalizer import LinearNormalizer
 from vqvae.models.multivqvae import MultiVQVAE
+# Load from the Hub
+vae = MultiVQVAE.from_pretrained("robotics-diffusion-transformer/RVQActionTokenizer").cuda().eval()
 normalizer = LinearNormalizer.load(
     "<Path_to_normalizer>"  # Download from:
     # http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt
 # Decode back to continuous actions
 recon_nsample = vae.decode(tokens)
 recon_action_chunk = normalizer["action"].unnormalize(recon_nsample)
 ```
 ---
 ---
 ## Safety & Intended Use
 RVQ-AT is a representation learning component. **Do not** deploy decoded actions directly to hardware without:
 If you use RVQ-AT in your work, please cite:
 ```bibtex
+@article{rdt2,
+  title={RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization},
+  author={RDT Team},
+  journal={arXiv preprint arXiv:2602.03310},
+  year={2026}
+}
+@software{rdt2_code,
     title={RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data},
     author={RDT Team},
     url={https://github.com/thu-ml/RDT2},
     year={2025}
 }
 ```
 ## Contact
+* Issues & requests: open a [GitHub issue](https://github.com/thu-ml/RDT2/issues) or start a Hub discussion on the model page.
 ---