|
|
--- |
|
|
library_name: transformers |
|
|
tags: [] |
|
|
--- |
|
|
|
|
|
# BEAST: B-Spline Encoded Action Sequences Tokenizer |
|
|
|
|
|
BEAST is an action tokenizer that converts continuous robot action sequences into discrete tokens using B-splines. It enables efficient trajectory compression for imitation learning by representing smooth robot motions as compact token sequences. |
|
|
|
|
|
## Installation |
|
|
|
|
|
Install the required dependencies: |
|
|
|
|
|
```bash |
|
|
pip install torch numpy matplotlib einops transformers |
|
|
``` |
|
|
|
|
|
**Note:** CUDA is recommended for optimal performance, but CPU is also supported by setting `device="cpu"`. |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoProcessor |
|
|
import torch |
|
|
|
|
|
# Initialize the BEAST processor with configuration parameters: |
|
|
# - num_dof: degrees of freedom (3 for 3D trajectories like x, y, z) |
|
|
# - num_basis: number of B-spline basis functions used for trajectory representation |
|
|
# - seq_len: length of the trajectory sequence (number of time steps) |
|
|
# - degree_p: degree of the B-spline polynomial (3 = cubic spline) |
|
|
# - device: computation device ('cpu' or 'cuda') |
|
|
beast = AutoProcessor.from_pretrained( |
|
|
"zhouhongyi/beast", |
|
|
trust_remote_code=True, |
|
|
num_dof = 3, |
|
|
num_basis = 20, |
|
|
seq_len = 50, |
|
|
degree_p = 3, |
|
|
device = 'cpu' |
|
|
) |
|
|
|
|
|
# Create random trajectory data: 10 trajectories, each with 50 time steps, 3 dimensions |
|
|
trajectories = torch.randn(10, 50, 3) |
|
|
|
|
|
# Encode trajectories into discrete tokens |
|
|
# update_bounds=True allows the processor to adaptively update quantization bounds |
|
|
tokens = beast.encode_discrete(trajectories, update_bounds=True) |
|
|
print(f"Encoded tokens shape: {tokens.shape}") |
|
|
|
|
|
# Decode tokens back to continuous trajectories |
|
|
reconstructed_trajectories = beast.decode_discrete(tokens) |
|
|
print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}") |
|
|
|
|
|
# Calculate mean squared error to measure reconstruction quality |
|
|
mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2) |
|
|
print(f"MSE Loss: {mse_loss.item()}") |
|
|
|
|
|
# Visualize the reconstruction error for analysis |
|
|
beast.visualize_reconstruction_error_discrete(trajectories) |
|
|
``` |
|
|
|
|
|
### Continuous Encoding |
|
|
|
|
|
For integration with continuous generative models: |
|
|
|
|
|
```python |
|
|
# Encode to normalized continuous parameters [-1, 1] |
|
|
params = beast.encode_continuous(trajectories, update_bounds=True) |
|
|
|
|
|
# Decode back |
|
|
reconstructed = beast.decode_continuous(params) |
|
|
``` |
|
|
|
|
|
## Parameters |
|
|
|
|
|
| Parameter | Description | Default | |
|
|
|-----------|-------------|---------| |
|
|
| `num_dof` | Total degrees of freedom (robot joints + gripper) | 7 | |
|
|
| `num_basis` | Number of B-spline basis functions. Higher values improve reconstruction fidelity but produce more tokens | 10 | |
|
|
| `seq_len` | Trajectory sequence length (number of timesteps) | 50 | |
|
|
| `vocab_size` | Discrete vocabulary size (256 = 8-bit tokens) | 256 | |
|
|
| `degree_p` | B-spline polynomial degree. Higher degrees produce smoother curves (3=cubic, 4=quartic) | 4 | |
|
|
| `device` | Torch device (`"cuda"` or `"cpu"`) | `"cuda"` | |
|
|
| `gripper_zero_order` | Use piecewise-constant (degree 0) splines for gripper DOFs. Useful for binary gripper states | `False` | |
|
|
| `gripper_dof` | Number of gripper DOFs, assumed to be in the end. Only used when `gripper_zero_order=True` | 1 | |
|
|
| `enforce_init_pos` | Enforce initial position constraint during decoding | `False` | |
|
|
|
|
|
### Token Count |
|
|
|
|
|
The total number of tokens per trajectory is: `num_basis * num_dof` |
|
|
|
|
|
For example, with default settings (10 basis, 7 DOF): 70 tokens per trajectory. |
|
|
|
|
|
## API Reference |
|
|
|
|
|
### Encoding Methods |
|
|
|
|
|
**`encode_discrete(trajs, update_bounds=True)`** |
|
|
- Input: Trajectories tensor `[batch, seq_len, num_dof]` |
|
|
- Output: Discrete tokens `[batch, num_basis * num_dof]` in range `[0, vocab_size-1]` |
|
|
- `update_bounds`: Whether to update internal weight bounds from this batch |
|
|
|
|
|
**`encode_continuous(trajs, update_bounds=True)`** |
|
|
- Input: Trajectories tensor `[batch, seq_len, num_dof]` |
|
|
- Output: Normalized parameters `[batch, num_basis * num_dof]` in range `[-1, 1]` |
|
|
|
|
|
### Decoding Methods |
|
|
|
|
|
**`decode_discrete(tokens, times=None, init_pos=None)`** |
|
|
- Input: Discrete tokens `[batch, num_basis * num_dof]` |
|
|
- Output: Reconstructed trajectories `[batch, seq_len, num_dof]` |
|
|
- `times`: Custom time points (optional, defaults to `seq_len` uniform points) |
|
|
- `init_pos`: Initial position constraint (optional) |
|
|
|
|
|
**`decode_continuous(params, times=None, init_pos=None)`** |
|
|
- Input: Normalized parameters `[batch, num_basis * num_dof]` |
|
|
- Output: Reconstructed trajectories `[batch, seq_len, num_dof]` |
|
|
|
|
|
### Utility Methods |
|
|
|
|
|
**`compute_reconstruction_error(raw_traj)`** |
|
|
- Compute MSE between original and reconstructed trajectory |
|
|
|
|
|
**`visualize_reconstruction_error_discrete(raw_traj)`** / **`visualize_reconstruction_error_continuous(raw_traj)`** |
|
|
- Plot original vs reconstructed trajectories for visual comparison |
|
|
|
|
|
|
|
|
## Uses |
|
|
|
|
|
### Intended Use Cases |
|
|
|
|
|
- **Robot Imitation Learning**: Compress continuous demonstration trajectories into discrete tokens for language model-based policy learning |
|
|
- **Trajectory Compression**: Reduce memory footprint of robot demonstration datasets while preserving motion quality |
|
|
- **Action Tokenization**: Enable transformer-based models to process robot actions as discrete token sequences |
|
|
|
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use BEAST in your research, please cite: |
|
|
|
|
|
**BibTeX:** |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{ |
|
|
zhou2025beast, |
|
|
title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning}, |
|
|
author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov}, |
|
|
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, |
|
|
year={2025}, |
|
|
url={https://openreview.net/forum?id=rQCl1sf62w} |
|
|
} |
|
|
``` |
|
|
|