beast / README.md
zhouhongyi's picture
update the bibtex for neurips
ec3cb26
---
library_name: transformers
tags: []
---
# BEAST: B-Spline Encoded Action Sequences Tokenizer
BEAST is an action tokenizer that converts continuous robot action sequences into discrete tokens using B-splines. It enables efficient trajectory compression for imitation learning by representing smooth robot motions as compact token sequences.
## Installation
Install the required dependencies:
```bash
pip install torch numpy matplotlib einops transformers
```
**Note:** CUDA is recommended for optimal performance, but CPU is also supported by setting `device="cpu"`.
## Quick Start
```python
from transformers import AutoProcessor
import torch
# Initialize the BEAST processor with configuration parameters:
# - num_dof: degrees of freedom (3 for 3D trajectories like x, y, z)
# - num_basis: number of B-spline basis functions used for trajectory representation
# - seq_len: length of the trajectory sequence (number of time steps)
# - degree_p: degree of the B-spline polynomial (3 = cubic spline)
# - device: computation device ('cpu' or 'cuda')
beast = AutoProcessor.from_pretrained(
"zhouhongyi/beast",
trust_remote_code=True,
num_dof = 3,
num_basis = 20,
seq_len = 50,
degree_p = 3,
device = 'cpu'
)
# Create random trajectory data: 10 trajectories, each with 50 time steps, 3 dimensions
trajectories = torch.randn(10, 50, 3)
# Encode trajectories into discrete tokens
# update_bounds=True allows the processor to adaptively update quantization bounds
tokens = beast.encode_discrete(trajectories, update_bounds=True)
print(f"Encoded tokens shape: {tokens.shape}")
# Decode tokens back to continuous trajectories
reconstructed_trajectories = beast.decode_discrete(tokens)
print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}")
# Calculate mean squared error to measure reconstruction quality
mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2)
print(f"MSE Loss: {mse_loss.item()}")
# Visualize the reconstruction error for analysis
beast.visualize_reconstruction_error_discrete(trajectories)
```
### Continuous Encoding
For integration with continuous generative models:
```python
# Encode to normalized continuous parameters [-1, 1]
params = beast.encode_continuous(trajectories, update_bounds=True)
# Decode back
reconstructed = beast.decode_continuous(params)
```
## Parameters
| Parameter | Description | Default |
|-----------|-------------|---------|
| `num_dof` | Total degrees of freedom (robot joints + gripper) | 7 |
| `num_basis` | Number of B-spline basis functions. Higher values improve reconstruction fidelity but produce more tokens | 10 |
| `seq_len` | Trajectory sequence length (number of timesteps) | 50 |
| `vocab_size` | Discrete vocabulary size (256 = 8-bit tokens) | 256 |
| `degree_p` | B-spline polynomial degree. Higher degrees produce smoother curves (3=cubic, 4=quartic) | 4 |
| `device` | Torch device (`"cuda"` or `"cpu"`) | `"cuda"` |
| `gripper_zero_order` | Use piecewise-constant (degree 0) splines for gripper DOFs. Useful for binary gripper states | `False` |
| `gripper_dof` | Number of gripper DOFs, assumed to be in the end. Only used when `gripper_zero_order=True` | 1 |
| `enforce_init_pos` | Enforce initial position constraint during decoding | `False` |
### Token Count
The total number of tokens per trajectory is: `num_basis * num_dof`
For example, with default settings (10 basis, 7 DOF): 70 tokens per trajectory.
## API Reference
### Encoding Methods
**`encode_discrete(trajs, update_bounds=True)`**
- Input: Trajectories tensor `[batch, seq_len, num_dof]`
- Output: Discrete tokens `[batch, num_basis * num_dof]` in range `[0, vocab_size-1]`
- `update_bounds`: Whether to update internal weight bounds from this batch
**`encode_continuous(trajs, update_bounds=True)`**
- Input: Trajectories tensor `[batch, seq_len, num_dof]`
- Output: Normalized parameters `[batch, num_basis * num_dof]` in range `[-1, 1]`
### Decoding Methods
**`decode_discrete(tokens, times=None, init_pos=None)`**
- Input: Discrete tokens `[batch, num_basis * num_dof]`
- Output: Reconstructed trajectories `[batch, seq_len, num_dof]`
- `times`: Custom time points (optional, defaults to `seq_len` uniform points)
- `init_pos`: Initial position constraint (optional)
**`decode_continuous(params, times=None, init_pos=None)`**
- Input: Normalized parameters `[batch, num_basis * num_dof]`
- Output: Reconstructed trajectories `[batch, seq_len, num_dof]`
### Utility Methods
**`compute_reconstruction_error(raw_traj)`**
- Compute MSE between original and reconstructed trajectory
**`visualize_reconstruction_error_discrete(raw_traj)`** / **`visualize_reconstruction_error_continuous(raw_traj)`**
- Plot original vs reconstructed trajectories for visual comparison
## Uses
### Intended Use Cases
- **Robot Imitation Learning**: Compress continuous demonstration trajectories into discrete tokens for language model-based policy learning
- **Trajectory Compression**: Reduce memory footprint of robot demonstration datasets while preserving motion quality
- **Action Tokenization**: Enable transformer-based models to process robot actions as discrete token sequences
## Citation
If you use BEAST in your research, please cite:
**BibTeX:**
```bibtex
@inproceedings{
zhou2025beast,
title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=rQCl1sf62w}
}
```