Update model card with paper, project, and code links
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,4 +1,6 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
tags:
|
| 3 |
- RDT
|
| 4 |
- rdt
|
|
@@ -7,12 +9,12 @@ tags:
|
|
| 7 |
- discrete
|
| 8 |
- vector-quantization
|
| 9 |
- RDT 2
|
| 10 |
-
license: apache-2.0
|
| 11 |
-
pipeline_tag: robotics
|
| 12 |
---
|
| 13 |
|
| 14 |
# RVQ-AT: Residual VQ Action Tokenizer for RDT 2
|
| 15 |
|
|
|
|
|
|
|
| 16 |
**RVQ-AT** is a fast, compact **Residual Vector-Quantization** (RVQ) tokenizer for robot action streams.
|
| 17 |
It converts continuous control trajectories into short sequences of **discrete action tokens** that plug directly into autoregressive VLA models.
|
| 18 |
|
|
@@ -29,10 +31,9 @@ Here, we provide:
|
|
| 29 |
|
| 30 |
---
|
| 31 |
|
| 32 |
-
|
| 33 |
## Using the Universal RVQ-AT Tokenizer
|
| 34 |
|
| 35 |
-
We recommend chunking actions into
|
| 36 |
|
| 37 |
```python
|
| 38 |
# Run under repository: https://github.com/thu-ml/RDT2
|
|
@@ -42,8 +43,8 @@ import numpy as np
|
|
| 42 |
from models.normalizer import LinearNormalizer
|
| 43 |
from vqvae.models.multivqvae import MultiVQVAE
|
| 44 |
|
| 45 |
-
# Load from the Hub
|
| 46 |
-
vae = MultiVQVAE.from_pretrained("
|
| 47 |
normalizer = LinearNormalizer.load(
|
| 48 |
"<Path_to_normalizer>" # Download from:
|
| 49 |
# http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt
|
|
@@ -72,7 +73,6 @@ tokens = vae.encode(nsample) # or vae.encode(action_chunk)
|
|
| 72 |
# Decode back to continuous actions
|
| 73 |
recon_nsample = vae.decode(tokens)
|
| 74 |
recon_action_chunk = normalizer["action"].unnormalize(recon_nsample)
|
| 75 |
-
|
| 76 |
```
|
| 77 |
|
| 78 |
---
|
|
@@ -85,16 +85,6 @@ Afterward, evaluate the reconstruction error on your data before using it for yo
|
|
| 85 |
|
| 86 |
---
|
| 87 |
|
| 88 |
-
<!-- ## Performance (Universal Model)
|
| 89 |
-
|
| 90 |
-
*(Representative, measured on internal eval — replace with your numbers when available.)*
|
| 91 |
-
|
| 92 |
-
* **Compression:** 4 levels × 1 token/step → 4 tokens/step (often reduced further with temporal stride).
|
| 93 |
-
* **Reconstruction:** MSE ↓ 25–40% vs. single-codebook VQ at equal bitrate.
|
| 94 |
-
* **Latency:** <1 ms per 50×14 chunk on A100/PCIe; CPU-only real-time at 50 Hz feasible.
|
| 95 |
-
|
| 96 |
-
---
|
| 97 |
-
-->
|
| 98 |
## Safety & Intended Use
|
| 99 |
|
| 100 |
RVQ-AT is a representation learning component. **Do not** deploy decoded actions directly to hardware without:
|
|
@@ -110,11 +100,17 @@ RVQ-AT is a representation learning component. **Do not** deploy decoded actions
|
|
| 110 |
If you use RVQ-AT in your work, please cite:
|
| 111 |
|
| 112 |
```bibtex
|
| 113 |
-
@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
title={RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data},
|
| 115 |
author={RDT Team},
|
| 116 |
url={https://github.com/thu-ml/RDT2},
|
| 117 |
-
month={September},
|
| 118 |
year={2025}
|
| 119 |
}
|
| 120 |
```
|
|
@@ -123,7 +119,7 @@ If you use RVQ-AT in your work, please cite:
|
|
| 123 |
|
| 124 |
## Contact
|
| 125 |
|
| 126 |
-
* Issues & requests: open a GitHub issue
|
| 127 |
|
| 128 |
---
|
| 129 |
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: robotics
|
| 4 |
tags:
|
| 5 |
- RDT
|
| 6 |
- rdt
|
|
|
|
| 9 |
- discrete
|
| 10 |
- vector-quantization
|
| 11 |
- RDT 2
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# RVQ-AT: Residual VQ Action Tokenizer for RDT 2
|
| 15 |
|
| 16 |
+
[**Project Page**](https://rdt-robotics.github.io/rdt2/) | [**Code**](https://github.com/thu-ml/RDT2) | [**Paper**](https://huggingface.co/papers/2602.03310)
|
| 17 |
+
|
| 18 |
**RVQ-AT** is a fast, compact **Residual Vector-Quantization** (RVQ) tokenizer for robot action streams.
|
| 19 |
It converts continuous control trajectories into short sequences of **discrete action tokens** that plug directly into autoregressive VLA models.
|
| 20 |
|
|
|
|
| 31 |
|
| 32 |
---
|
| 33 |
|
|
|
|
| 34 |
## Using the Universal RVQ-AT Tokenizer
|
| 35 |
|
| 36 |
+
We recommend chunking actions into ~**0.8 s windows** with fps = 30 and normalizing each action dimension using [normalizer](http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt) to **[-1, 1]** before tokenization. Batched encode/decode are supported.
|
| 37 |
|
| 38 |
```python
|
| 39 |
# Run under repository: https://github.com/thu-ml/RDT2
|
|
|
|
| 43 |
from models.normalizer import LinearNormalizer
|
| 44 |
from vqvae.models.multivqvae import MultiVQVAE
|
| 45 |
|
| 46 |
+
# Load from the Hub
|
| 47 |
+
vae = MultiVQVAE.from_pretrained("robotics-diffusion-transformer/RVQActionTokenizer").cuda().eval()
|
| 48 |
normalizer = LinearNormalizer.load(
|
| 49 |
"<Path_to_normalizer>" # Download from:
|
| 50 |
# http://ml.cs.tsinghua.edu.cn/~lingxuan/rdt2/umi_normalizer_wo_downsample_indentity_rot.pt
|
|
|
|
| 73 |
# Decode back to continuous actions
|
| 74 |
recon_nsample = vae.decode(tokens)
|
| 75 |
recon_action_chunk = normalizer["action"].unnormalize(recon_nsample)
|
|
|
|
| 76 |
```
|
| 77 |
|
| 78 |
---
|
|
|
|
| 85 |
|
| 86 |
---
|
| 87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
## Safety & Intended Use
|
| 89 |
|
| 90 |
RVQ-AT is a representation learning component. **Do not** deploy decoded actions directly to hardware without:
|
|
|
|
| 100 |
If you use RVQ-AT in your work, please cite:
|
| 101 |
|
| 102 |
```bibtex
|
| 103 |
+
@article{rdt2,
|
| 104 |
+
title={RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization},
|
| 105 |
+
author={RDT Team},
|
| 106 |
+
journal={arXiv preprint arXiv:2602.03310},
|
| 107 |
+
year={2026}
|
| 108 |
+
}
|
| 109 |
+
|
| 110 |
+
@software{rdt2_code,
|
| 111 |
title={RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data},
|
| 112 |
author={RDT Team},
|
| 113 |
url={https://github.com/thu-ml/RDT2},
|
|
|
|
| 114 |
year={2025}
|
| 115 |
}
|
| 116 |
```
|
|
|
|
| 119 |
|
| 120 |
## Contact
|
| 121 |
|
| 122 |
+
* Issues & requests: open a [GitHub issue](https://github.com/thu-ml/RDT2/issues) or start a Hub discussion on the model page.
|
| 123 |
|
| 124 |
---
|
| 125 |
|