BEAST: B-Spline Encoded Action Sequences Tokenizer
BEAST is an action tokenizer that converts continuous robot action sequences into discrete tokens using B-splines. It enables efficient trajectory compression for imitation learning by representing smooth robot motions as compact token sequences.
Installation
Install the required dependencies:
pip install torch numpy matplotlib einops transformers
Note: CUDA is recommended for optimal performance, but CPU is also supported by setting device="cpu".
Quick Start
from transformers import AutoProcessor
import torch
# Initialize the BEAST processor with configuration parameters:
# - num_dof: degrees of freedom (3 for 3D trajectories like x, y, z)
# - num_basis: number of B-spline basis functions used for trajectory representation
# - seq_len: length of the trajectory sequence (number of time steps)
# - degree_p: degree of the B-spline polynomial (3 = cubic spline)
# - device: computation device ('cpu' or 'cuda')
beast = AutoProcessor.from_pretrained(
"zhouhongyi/beast",
trust_remote_code=True,
num_dof = 3,
num_basis = 20,
seq_len = 50,
degree_p = 3,
device = 'cpu'
)
# Create random trajectory data: 10 trajectories, each with 50 time steps, 3 dimensions
trajectories = torch.randn(10, 50, 3)
# Encode trajectories into discrete tokens
# update_bounds=True allows the processor to adaptively update quantization bounds
tokens = beast.encode_discrete(trajectories, update_bounds=True)
print(f"Encoded tokens shape: {tokens.shape}")
# Decode tokens back to continuous trajectories
reconstructed_trajectories = beast.decode_discrete(tokens)
print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}")
# Calculate mean squared error to measure reconstruction quality
mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2)
print(f"MSE Loss: {mse_loss.item()}")
# Visualize the reconstruction error for analysis
beast.visualize_reconstruction_error_discrete(trajectories)
Continuous Encoding
For integration with continuous generative models:
# Encode to normalized continuous parameters [-1, 1]
params = beast.encode_continuous(trajectories, update_bounds=True)
# Decode back
reconstructed = beast.decode_continuous(params)
Parameters
| Parameter | Description | Default |
|---|---|---|
num_dof |
Total degrees of freedom (robot joints + gripper) | 7 |
num_basis |
Number of B-spline basis functions. Higher values improve reconstruction fidelity but produce more tokens | 10 |
seq_len |
Trajectory sequence length (number of timesteps) | 50 |
vocab_size |
Discrete vocabulary size (256 = 8-bit tokens) | 256 |
degree_p |
B-spline polynomial degree. Higher degrees produce smoother curves (3=cubic, 4=quartic) | 4 |
device |
Torch device ("cuda" or "cpu") |
"cuda" |
gripper_zero_order |
Use piecewise-constant (degree 0) splines for gripper DOFs. Useful for binary gripper states | False |
gripper_dof |
Number of gripper DOFs, assumed to be in the end. Only used when gripper_zero_order=True |
1 |
enforce_init_pos |
Enforce initial position constraint during decoding | False |
Token Count
The total number of tokens per trajectory is: num_basis * num_dof
For example, with default settings (10 basis, 7 DOF): 70 tokens per trajectory.
API Reference
Encoding Methods
encode_discrete(trajs, update_bounds=True)
- Input: Trajectories tensor
[batch, seq_len, num_dof] - Output: Discrete tokens
[batch, num_basis * num_dof]in range[0, vocab_size-1] update_bounds: Whether to update internal weight bounds from this batch
encode_continuous(trajs, update_bounds=True)
- Input: Trajectories tensor
[batch, seq_len, num_dof] - Output: Normalized parameters
[batch, num_basis * num_dof]in range[-1, 1]
Decoding Methods
decode_discrete(tokens, times=None, init_pos=None)
- Input: Discrete tokens
[batch, num_basis * num_dof] - Output: Reconstructed trajectories
[batch, seq_len, num_dof] times: Custom time points (optional, defaults toseq_lenuniform points)init_pos: Initial position constraint (optional)
decode_continuous(params, times=None, init_pos=None)
- Input: Normalized parameters
[batch, num_basis * num_dof] - Output: Reconstructed trajectories
[batch, seq_len, num_dof]
Utility Methods
compute_reconstruction_error(raw_traj)
- Compute MSE between original and reconstructed trajectory
visualize_reconstruction_error_discrete(raw_traj) / visualize_reconstruction_error_continuous(raw_traj)
- Plot original vs reconstructed trajectories for visual comparison
Uses
Intended Use Cases
- Robot Imitation Learning: Compress continuous demonstration trajectories into discrete tokens for language model-based policy learning
- Trajectory Compression: Reduce memory footprint of robot demonstration datasets while preserving motion quality
- Action Tokenization: Enable transformer-based models to process robot actions as discrete token sequences
Citation
If you use BEAST in your research, please cite:
BibTeX:
@inproceedings{
zhou2025beast,
title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=rQCl1sf62w}
}