nielsr HF Staff commited on
Commit
505e902
·
verified ·
1 Parent(s): 4e2c172

Update model card with paper, project, and code links

Browse files

Hi! I'm Niels from the community science team at Hugging Face.

This PR improves the documentation for the RVQ-AT tokenizer by:
- Linking it to the corresponding research paper: [RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization](https://huggingface.co/papers/2602.03310).
- Adding links to the official GitHub repository and project page.
- Refining the citation section.

These changes help researchers better discover and attribute the work. Let me know if you have any questions!

Files changed (1) hide show
  1. README.md +16 -20
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  tags:
3
  - RDT
4
  - rdt
@@ -7,12 +9,12 @@ tags:
7
  - discrete
8
  - vector-quantization
9
  - RDT 2
10
- license: apache-2.0
11
- pipeline_tag: robotics
12
  ---
13
 
14
  # RVQ-AT: Residual VQ Action Tokenizer for RDT 2
15
 
 
 
16
  **RVQ-AT** is a fast, compact **Residual Vector-Quantization** (RVQ) tokenizer for robot action streams.
17
  It converts continuous control trajectories into short sequences of **discrete action tokens** that plug directly into autoregressive VLA models.
18
 
@@ -29,10 +31,9 @@ Here, we provide:
29
 
30
  ---
31
 
32
-
33
  ## Using the Universal RVQ-AT Tokenizer
34
 
35
- We recommend chunking actions into \~**0.8 s windows** with fps = 30 and normalizing each action dimension using [normalizer](http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt) to **\[-1, 1]** before tokenization. Batched encode/decode are supported.
36
 
37
  ```python
38
  # Run under repository: https://github.com/thu-ml/RDT2
@@ -42,8 +43,8 @@ import numpy as np
42
  from models.normalizer import LinearNormalizer
43
  from vqvae.models.multivqvae import MultiVQVAE
44
 
45
- # Load from the Hub (replace with your repo id once published)
46
- vae = MultiVQVAE.from_pretrained("outputs/vqvae_hf").cuda().eval()
47
  normalizer = LinearNormalizer.load(
48
  "<Path_to_normalizer>" # Download from:
49
  # http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt
@@ -72,7 +73,6 @@ tokens = vae.encode(nsample) # or vae.encode(action_chunk)
72
  # Decode back to continuous actions
73
  recon_nsample = vae.decode(tokens)
74
  recon_action_chunk = normalizer["action"].unnormalize(recon_nsample)
75
-
76
  ```
77
 
78
  ---
@@ -85,16 +85,6 @@ Afterward, evaluate the reconstruction error on your data before using it for yo
85
 
86
  ---
87
 
88
- <!-- ## Performance (Universal Model)
89
-
90
- *(Representative, measured on internal eval — replace with your numbers when available.)*
91
-
92
- * **Compression:** 4 levels × 1 token/step → 4 tokens/step (often reduced further with temporal stride).
93
- * **Reconstruction:** MSE ↓ 25–40% vs. single-codebook VQ at equal bitrate.
94
- * **Latency:** <1 ms per 50×14 chunk on A100/PCIe; CPU-only real-time at 50 Hz feasible.
95
-
96
- ---
97
- -->
98
  ## Safety & Intended Use
99
 
100
  RVQ-AT is a representation learning component. **Do not** deploy decoded actions directly to hardware without:
@@ -110,11 +100,17 @@ RVQ-AT is a representation learning component. **Do not** deploy decoded actions
110
  If you use RVQ-AT in your work, please cite:
111
 
112
  ```bibtex
113
- @software{rdt2,
 
 
 
 
 
 
 
114
  title={RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data},
115
  author={RDT Team},
116
  url={https://github.com/thu-ml/RDT2},
117
- month={September},
118
  year={2025}
119
  }
120
  ```
@@ -123,7 +119,7 @@ If you use RVQ-AT in your work, please cite:
123
 
124
  ## Contact
125
 
126
- * Issues & requests: open a GitHub issue (see [here](https://github.com/thu-ml/RDT2/blob/main/CONTRIBUTING.md) for guidelines) or start a Hub discussion on the model page.
127
 
128
  ---
129
 
 
1
  ---
2
+ license: apache-2.0
3
+ pipeline_tag: robotics
4
  tags:
5
  - RDT
6
  - rdt
 
9
  - discrete
10
  - vector-quantization
11
  - RDT 2
 
 
12
  ---
13
 
14
  # RVQ-AT: Residual VQ Action Tokenizer for RDT 2
15
 
16
+ [**Project Page**](https://rdt-robotics.github.io/rdt2/) | [**Code**](https://github.com/thu-ml/RDT2) | [**Paper**](https://huggingface.co/papers/2602.03310)
17
+
18
  **RVQ-AT** is a fast, compact **Residual Vector-Quantization** (RVQ) tokenizer for robot action streams.
19
  It converts continuous control trajectories into short sequences of **discrete action tokens** that plug directly into autoregressive VLA models.
20
 
 
31
 
32
  ---
33
 
 
34
  ## Using the Universal RVQ-AT Tokenizer
35
 
36
+ We recommend chunking actions into ~**0.8 s windows** with fps = 30 and normalizing each action dimension using [normalizer](http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt) to **[-1, 1]** before tokenization. Batched encode/decode are supported.
37
 
38
  ```python
39
  # Run under repository: https://github.com/thu-ml/RDT2
 
43
  from models.normalizer import LinearNormalizer
44
  from vqvae.models.multivqvae import MultiVQVAE
45
 
46
+ # Load from the Hub
47
+ vae = MultiVQVAE.from_pretrained("robotics-diffusion-transformer/RVQActionTokenizer").cuda().eval()
48
  normalizer = LinearNormalizer.load(
49
  "<Path_to_normalizer>" # Download from:
50
  # http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt
 
73
  # Decode back to continuous actions
74
  recon_nsample = vae.decode(tokens)
75
  recon_action_chunk = normalizer["action"].unnormalize(recon_nsample)
 
76
  ```
77
 
78
  ---
 
85
 
86
  ---
87
 
 
 
 
 
 
 
 
 
 
 
88
  ## Safety & Intended Use
89
 
90
  RVQ-AT is a representation learning component. **Do not** deploy decoded actions directly to hardware without:
 
100
  If you use RVQ-AT in your work, please cite:
101
 
102
  ```bibtex
103
+ @article{rdt2,
104
+ title={RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization},
105
+ author={RDT Team},
106
+ journal={arXiv preprint arXiv:2602.03310},
107
+ year={2026}
108
+ }
109
+
110
+ @software{rdt2_code,
111
  title={RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data},
112
  author={RDT Team},
113
  url={https://github.com/thu-ml/RDT2},
 
114
  year={2025}
115
  }
116
  ```
 
119
 
120
  ## Contact
121
 
122
+ * Issues & requests: open a [GitHub issue](https://github.com/thu-ml/RDT2/issues) or start a Hub discussion on the model page.
123
 
124
  ---
125