Update model card with paper, project, and code links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +16 -20
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  tags:
3
  - RDT
4
  - rdt
@@ -7,12 +9,12 @@ tags:
7
  - discrete
8
  - vector-quantization
9
  - RDT 2
10
- license: apache-2.0
11
- pipeline_tag: robotics
12
  ---
13
 
14
  # RVQ-AT: Residual VQ Action Tokenizer for RDT 2
15
 
 
 
16
  **RVQ-AT** is a fast, compact **Residual Vector-Quantization** (RVQ) tokenizer for robot action streams.
17
  It converts continuous control trajectories into short sequences of **discrete action tokens** that plug directly into autoregressive VLA models.
18
 
@@ -29,10 +31,9 @@ Here, we provide:
29
 
30
  ---
31
 
32
-
33
  ## Using the Universal RVQ-AT Tokenizer
34
 
35
- We recommend chunking actions into \~**0.8 s windows** with fps = 30 and normalizing each action dimension using [normalizer](http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt) to **\[-1, 1]** before tokenization. Batched encode/decode are supported.
36
 
37
  ```python
38
  # Run under repository: https://github.com/thu-ml/RDT2
@@ -42,8 +43,8 @@ import numpy as np
42
  from models.normalizer import LinearNormalizer
43
  from vqvae.models.multivqvae import MultiVQVAE
44
 
45
- # Load from the Hub (replace with your repo id once published)
46
- vae = MultiVQVAE.from_pretrained("outputs/vqvae_hf").cuda().eval()
47
  normalizer = LinearNormalizer.load(
48
  "<Path_to_normalizer>" # Download from:
49
  # http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt
@@ -72,7 +73,6 @@ tokens = vae.encode(nsample) # or vae.encode(action_chunk)
72
  # Decode back to continuous actions
73
  recon_nsample = vae.decode(tokens)
74
  recon_action_chunk = normalizer["action"].unnormalize(recon_nsample)
75
-
76
  ```
77
 
78
  ---
@@ -85,16 +85,6 @@ Afterward, evaluate the reconstruction error on your data before using it for yo
85
 
86
  ---
87
 
88
- <!-- ## Performance (Universal Model)
89
-
90
- *(Representative, measured on internal eval — replace with your numbers when available.)*
91
-
92
- * **Compression:** 4 levels × 1 token/step → 4 tokens/step (often reduced further with temporal stride).
93
- * **Reconstruction:** MSE ↓ 25–40% vs. single-codebook VQ at equal bitrate.
94
- * **Latency:** <1 ms per 50×14 chunk on A100/PCIe; CPU-only real-time at 50 Hz feasible.
95
-
96
- ---
97
- -->
98
  ## Safety & Intended Use
99
 
100
  RVQ-AT is a representation learning component. **Do not** deploy decoded actions directly to hardware without:
@@ -110,11 +100,17 @@ RVQ-AT is a representation learning component. **Do not** deploy decoded actions
110
  If you use RVQ-AT in your work, please cite:
111
 
112
  ```bibtex
113
- @software{rdt2,
 
 
 
 
 
 
 
114
  title={RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data},
115
  author={RDT Team},
116
  url={https://github.com/thu-ml/RDT2},
117
- month={September},
118
  year={2025}
119
  }
120
  ```
@@ -123,7 +119,7 @@ If you use RVQ-AT in your work, please cite:
123
 
124
  ## Contact
125
 
126
- * Issues & requests: open a GitHub issue (see [here](https://github.com/thu-ml/RDT2/blob/main/CONTRIBUTING.md) for guidelines) or start a Hub discussion on the model page.
127
 
128
  ---
129
 
 
1
  ---
2
+ license: apache-2.0
3
+ pipeline_tag: robotics
4
  tags:
5
  - RDT
6
  - rdt
 
9
  - discrete
10
  - vector-quantization
11
  - RDT 2
 
 
12
  ---
13
 
14
  # RVQ-AT: Residual VQ Action Tokenizer for RDT 2
15
 
16
+ [**Project Page**](https://rdt-robotics.github.io/rdt2/) | [**Code**](https://github.com/thu-ml/RDT2) | [**Paper**](https://huggingface.co/papers/2602.03310)
17
+
18
  **RVQ-AT** is a fast, compact **Residual Vector-Quantization** (RVQ) tokenizer for robot action streams.
19
  It converts continuous control trajectories into short sequences of **discrete action tokens** that plug directly into autoregressive VLA models.
20
 
 
31
 
32
  ---
33
 
 
34
  ## Using the Universal RVQ-AT Tokenizer
35
 
36
+ We recommend chunking actions into ~**0.8 s windows** with fps = 30 and normalizing each action dimension using [normalizer](http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt) to **[-1, 1]** before tokenization. Batched encode/decode are supported.
37
 
38
  ```python
39
  # Run under repository: https://github.com/thu-ml/RDT2
 
43
  from models.normalizer import LinearNormalizer
44
  from vqvae.models.multivqvae import MultiVQVAE
45
 
46
+ # Load from the Hub
47
+ vae = MultiVQVAE.from_pretrained("robotics-diffusion-transformer/RVQActionTokenizer").cuda().eval()
48
  normalizer = LinearNormalizer.load(
49
  "<Path_to_normalizer>" # Download from:
50
  # http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt
 
73
  # Decode back to continuous actions
74
  recon_nsample = vae.decode(tokens)
75
  recon_action_chunk = normalizer["action"].unnormalize(recon_nsample)
 
76
  ```
77
 
78
  ---
 
85
 
86
  ---
87
 
 
 
 
 
 
 
 
 
 
 
88
  ## Safety & Intended Use
89
 
90
  RVQ-AT is a representation learning component. **Do not** deploy decoded actions directly to hardware without:
 
100
  If you use RVQ-AT in your work, please cite:
101
 
102
  ```bibtex
103
+ @article{rdt2,
104
+ title={RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization},
105
+ author={RDT Team},
106
+ journal={arXiv preprint arXiv:2602.03310},
107
+ year={2026}
108
+ }
109
+
110
+ @software{rdt2_code,
111
  title={RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data},
112
  author={RDT Team},
113
  url={https://github.com/thu-ml/RDT2},
 
114
  year={2025}
115
  }
116
  ```
 
119
 
120
  ## Contact
121
 
122
+ * Issues & requests: open a [GitHub issue](https://github.com/thu-ml/RDT2/issues) or start a Hub discussion on the model page.
123
 
124
  ---
125