--- library_name: transformers tags: [] --- # BEAST: B-Spline Encoded Action Sequences Tokenizer BEAST is an action tokenizer that converts continuous robot action sequences into discrete tokens using B-splines. It enables efficient trajectory compression for imitation learning by representing smooth robot motions as compact token sequences. ## Installation Install the required dependencies: ```bash pip install torch numpy matplotlib einops transformers ``` **Note:** CUDA is recommended for optimal performance, but CPU is also supported by setting `device="cpu"`. ## Quick Start ```python from transformers import AutoProcessor import torch # Initialize the BEAST processor with configuration parameters: # - num_dof: degrees of freedom (3 for 3D trajectories like x, y, z) # - num_basis: number of B-spline basis functions used for trajectory representation # - seq_len: length of the trajectory sequence (number of time steps) # - degree_p: degree of the B-spline polynomial (3 = cubic spline) # - device: computation device ('cpu' or 'cuda') beast = AutoProcessor.from_pretrained( "zhouhongyi/beast", trust_remote_code=True, num_dof = 3, num_basis = 20, seq_len = 50, degree_p = 3, device = 'cpu' ) # Create random trajectory data: 10 trajectories, each with 50 time steps, 3 dimensions trajectories = torch.randn(10, 50, 3) # Encode trajectories into discrete tokens # update_bounds=True allows the processor to adaptively update quantization bounds tokens = beast.encode_discrete(trajectories, update_bounds=True) print(f"Encoded tokens shape: {tokens.shape}") # Decode tokens back to continuous trajectories reconstructed_trajectories = beast.decode_discrete(tokens) print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}") # Calculate mean squared error to measure reconstruction quality mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2) print(f"MSE Loss: {mse_loss.item()}") # Visualize the reconstruction error for analysis beast.visualize_reconstruction_error_discrete(trajectories) ``` ### Continuous Encoding For integration with continuous generative models: ```python # Encode to normalized continuous parameters [-1, 1] params = beast.encode_continuous(trajectories, update_bounds=True) # Decode back reconstructed = beast.decode_continuous(params) ``` ## Parameters | Parameter | Description | Default | |-----------|-------------|---------| | `num_dof` | Total degrees of freedom (robot joints + gripper) | 7 | | `num_basis` | Number of B-spline basis functions. Higher values improve reconstruction fidelity but produce more tokens | 10 | | `seq_len` | Trajectory sequence length (number of timesteps) | 50 | | `vocab_size` | Discrete vocabulary size (256 = 8-bit tokens) | 256 | | `degree_p` | B-spline polynomial degree. Higher degrees produce smoother curves (3=cubic, 4=quartic) | 4 | | `device` | Torch device (`"cuda"` or `"cpu"`) | `"cuda"` | | `gripper_zero_order` | Use piecewise-constant (degree 0) splines for gripper DOFs. Useful for binary gripper states | `False` | | `gripper_dof` | Number of gripper DOFs, assumed to be in the end. Only used when `gripper_zero_order=True` | 1 | | `enforce_init_pos` | Enforce initial position constraint during decoding | `False` | ### Token Count The total number of tokens per trajectory is: `num_basis * num_dof` For example, with default settings (10 basis, 7 DOF): 70 tokens per trajectory. ## API Reference ### Encoding Methods **`encode_discrete(trajs, update_bounds=True)`** - Input: Trajectories tensor `[batch, seq_len, num_dof]` - Output: Discrete tokens `[batch, num_basis * num_dof]` in range `[0, vocab_size-1]` - `update_bounds`: Whether to update internal weight bounds from this batch **`encode_continuous(trajs, update_bounds=True)`** - Input: Trajectories tensor `[batch, seq_len, num_dof]` - Output: Normalized parameters `[batch, num_basis * num_dof]` in range `[-1, 1]` ### Decoding Methods **`decode_discrete(tokens, times=None, init_pos=None)`** - Input: Discrete tokens `[batch, num_basis * num_dof]` - Output: Reconstructed trajectories `[batch, seq_len, num_dof]` - `times`: Custom time points (optional, defaults to `seq_len` uniform points) - `init_pos`: Initial position constraint (optional) **`decode_continuous(params, times=None, init_pos=None)`** - Input: Normalized parameters `[batch, num_basis * num_dof]` - Output: Reconstructed trajectories `[batch, seq_len, num_dof]` ### Utility Methods **`compute_reconstruction_error(raw_traj)`** - Compute MSE between original and reconstructed trajectory **`visualize_reconstruction_error_discrete(raw_traj)`** / **`visualize_reconstruction_error_continuous(raw_traj)`** - Plot original vs reconstructed trajectories for visual comparison ## Uses ### Intended Use Cases - **Robot Imitation Learning**: Compress continuous demonstration trajectories into discrete tokens for language model-based policy learning - **Trajectory Compression**: Reduce memory footprint of robot demonstration datasets while preserving motion quality - **Action Tokenization**: Enable transformer-based models to process robot actions as discrete token sequences ## Citation If you use BEAST in your research, please cite: **BibTeX:** ```bibtex @inproceedings{ zhou2025beast, title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning}, author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=rQCl1sf62w} } ```