BEAST: B-Spline Encoded Action Sequences Tokenizer

BEAST is an action tokenizer that converts continuous robot action sequences into discrete tokens using B-splines. It enables efficient trajectory compression for imitation learning by representing smooth robot motions as compact token sequences.

Installation

Install the required dependencies:

pip install torch numpy matplotlib einops transformers

Note: CUDA is recommended for optimal performance, but CPU is also supported by setting device="cpu".

Quick Start

from transformers import AutoProcessor
import torch

# Initialize the BEAST processor with configuration parameters:
# - num_dof: degrees of freedom (3 for 3D trajectories like x, y, z)
# - num_basis: number of B-spline basis functions used for trajectory representation
# - seq_len: length of the trajectory sequence (number of time steps)
# - degree_p: degree of the B-spline polynomial (3 = cubic spline)
# - device: computation device ('cpu' or 'cuda')
beast = AutoProcessor.from_pretrained(
    "zhouhongyi/beast",
    trust_remote_code=True,
    num_dof = 3,
    num_basis = 20,
    seq_len = 50,
    degree_p = 3,
    device = 'cpu'
)

# Create random trajectory data: 10 trajectories, each with 50 time steps, 3 dimensions
trajectories = torch.randn(10, 50, 3)

# Encode trajectories into discrete tokens
# update_bounds=True allows the processor to adaptively update quantization bounds
tokens = beast.encode_discrete(trajectories, update_bounds=True)
print(f"Encoded tokens shape: {tokens.shape}")

# Decode tokens back to continuous trajectories
reconstructed_trajectories = beast.decode_discrete(tokens)
print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}")

# Calculate mean squared error to measure reconstruction quality
mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2)
print(f"MSE Loss: {mse_loss.item()}")

# Visualize the reconstruction error for analysis
beast.visualize_reconstruction_error_discrete(trajectories)

Continuous Encoding

For integration with continuous generative models:

# Encode to normalized continuous parameters [-1, 1]
params = beast.encode_continuous(trajectories, update_bounds=True)

# Decode back
reconstructed = beast.decode_continuous(params)

Parameters

Parameter Description Default
num_dof Total degrees of freedom (robot joints + gripper) 7
num_basis Number of B-spline basis functions. Higher values improve reconstruction fidelity but produce more tokens 10
seq_len Trajectory sequence length (number of timesteps) 50
vocab_size Discrete vocabulary size (256 = 8-bit tokens) 256
degree_p B-spline polynomial degree. Higher degrees produce smoother curves (3=cubic, 4=quartic) 4
device Torch device ("cuda" or "cpu") "cuda"
gripper_zero_order Use piecewise-constant (degree 0) splines for gripper DOFs. Useful for binary gripper states False
gripper_dof Number of gripper DOFs, assumed to be in the end. Only used when gripper_zero_order=True 1
enforce_init_pos Enforce initial position constraint during decoding False

Token Count

The total number of tokens per trajectory is: num_basis * num_dof

For example, with default settings (10 basis, 7 DOF): 70 tokens per trajectory.

API Reference

Encoding Methods

encode_discrete(trajs, update_bounds=True)

  • Input: Trajectories tensor [batch, seq_len, num_dof]
  • Output: Discrete tokens [batch, num_basis * num_dof] in range [0, vocab_size-1]
  • update_bounds: Whether to update internal weight bounds from this batch

encode_continuous(trajs, update_bounds=True)

  • Input: Trajectories tensor [batch, seq_len, num_dof]
  • Output: Normalized parameters [batch, num_basis * num_dof] in range [-1, 1]

Decoding Methods

decode_discrete(tokens, times=None, init_pos=None)

  • Input: Discrete tokens [batch, num_basis * num_dof]
  • Output: Reconstructed trajectories [batch, seq_len, num_dof]
  • times: Custom time points (optional, defaults to seq_len uniform points)
  • init_pos: Initial position constraint (optional)

decode_continuous(params, times=None, init_pos=None)

  • Input: Normalized parameters [batch, num_basis * num_dof]
  • Output: Reconstructed trajectories [batch, seq_len, num_dof]

Utility Methods

compute_reconstruction_error(raw_traj)

  • Compute MSE between original and reconstructed trajectory

visualize_reconstruction_error_discrete(raw_traj) / visualize_reconstruction_error_continuous(raw_traj)

  • Plot original vs reconstructed trajectories for visual comparison

Uses

Intended Use Cases

  • Robot Imitation Learning: Compress continuous demonstration trajectories into discrete tokens for language model-based policy learning
  • Trajectory Compression: Reduce memory footprint of robot demonstration datasets while preserving motion quality
  • Action Tokenization: Enable transformer-based models to process robot actions as discrete token sequences

Citation

If you use BEAST in your research, please cite:

BibTeX:

@inproceedings{
zhou2025beast,
title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=rQCl1sf62w}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support