beast / README.md

update the bibtex for neurips

ec3cb26 12 days ago

5.83 kB

	---
	library_name: transformers
	tags: []
	---

	# BEAST: B-Spline Encoded Action Sequences Tokenizer

	BEAST is an action tokenizer that converts continuous robot action sequences into discrete tokens using B-splines. It enables efficient trajectory compression for imitation learning by representing smooth robot motions as compact token sequences.

	## Installation

	Install the required dependencies:

	```bash
	pip install torch numpy matplotlib einops transformers
	```

	Note: CUDA is recommended for optimal performance, but CPU is also supported by setting `device="cpu"`.

	## Quick Start

	```python
	from transformers import AutoProcessor
	import torch

	# Initialize the BEAST processor with configuration parameters:
	# - num_dof: degrees of freedom (3 for 3D trajectories like x, y, z)
	# - num_basis: number of B-spline basis functions used for trajectory representation
	# - seq_len: length of the trajectory sequence (number of time steps)
	# - degree_p: degree of the B-spline polynomial (3 = cubic spline)
	# - device: computation device ('cpu' or 'cuda')
	beast = AutoProcessor.from_pretrained(
	"zhouhongyi/beast",
	trust_remote_code=True,
	num_dof = 3,
	num_basis = 20,
	seq_len = 50,
	degree_p = 3,
	device = 'cpu'
	)

	# Create random trajectory data: 10 trajectories, each with 50 time steps, 3 dimensions
	trajectories = torch.randn(10, 50, 3)

	# Encode trajectories into discrete tokens
	# update_bounds=True allows the processor to adaptively update quantization bounds
	tokens = beast.encode_discrete(trajectories, update_bounds=True)
	print(f"Encoded tokens shape: {tokens.shape}")

	# Decode tokens back to continuous trajectories
	reconstructed_trajectories = beast.decode_discrete(tokens)
	print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}")

	# Calculate mean squared error to measure reconstruction quality
	mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2)
	print(f"MSE Loss: {mse_loss.item()}")

	# Visualize the reconstruction error for analysis
	beast.visualize_reconstruction_error_discrete(trajectories)
	```

	### Continuous Encoding

	For integration with continuous generative models:

	```python
	# Encode to normalized continuous parameters [-1, 1]
	params = beast.encode_continuous(trajectories, update_bounds=True)

	# Decode back
	reconstructed = beast.decode_continuous(params)
	```

	## Parameters

	\| Parameter \| Description \| Default \|
	\|-----------\|-------------\|---------\|
	\| `num_dof` \| Total degrees of freedom (robot joints + gripper) \| 7 \|
	\| `num_basis` \| Number of B-spline basis functions. Higher values improve reconstruction fidelity but produce more tokens \| 10 \|
	\| `seq_len` \| Trajectory sequence length (number of timesteps) \| 50 \|
	\| `vocab_size` \| Discrete vocabulary size (256 = 8-bit tokens) \| 256 \|
	\| `degree_p` \| B-spline polynomial degree. Higher degrees produce smoother curves (3=cubic, 4=quartic) \| 4 \|
	\| `device` \| Torch device (`"cuda"` or `"cpu"`) \| `"cuda"` \|
	\| `gripper_zero_order` \| Use piecewise-constant (degree 0) splines for gripper DOFs. Useful for binary gripper states \| `False` \|
	\| `gripper_dof` \| Number of gripper DOFs, assumed to be in the end. Only used when `gripper_zero_order=True` \| 1 \|
	\| `enforce_init_pos` \| Enforce initial position constraint during decoding \| `False` \|

	### Token Count

	The total number of tokens per trajectory is: `num_basis * num_dof`

	For example, with default settings (10 basis, 7 DOF): 70 tokens per trajectory.

	## API Reference

	### Encoding Methods

	`encode_discrete(trajs, update_bounds=True)`
	- Input: Trajectories tensor `[batch, seq_len, num_dof]`
	- Output: Discrete tokens `[batch, num_basis * num_dof]` in range `[0, vocab_size-1]`
	- `update_bounds`: Whether to update internal weight bounds from this batch

	`encode_continuous(trajs, update_bounds=True)`
	- Input: Trajectories tensor `[batch, seq_len, num_dof]`
	- Output: Normalized parameters `[batch, num_basis * num_dof]` in range `[-1, 1]`

	### Decoding Methods

	`decode_discrete(tokens, times=None, init_pos=None)`
	- Input: Discrete tokens `[batch, num_basis * num_dof]`
	- Output: Reconstructed trajectories `[batch, seq_len, num_dof]`
	- `times`: Custom time points (optional, defaults to `seq_len` uniform points)
	- `init_pos`: Initial position constraint (optional)

	`decode_continuous(params, times=None, init_pos=None)`
	- Input: Normalized parameters `[batch, num_basis * num_dof]`
	- Output: Reconstructed trajectories `[batch, seq_len, num_dof]`

	### Utility Methods

	`compute_reconstruction_error(raw_traj)`
	- Compute MSE between original and reconstructed trajectory

	`visualize_reconstruction_error_discrete(raw_traj)` / `visualize_reconstruction_error_continuous(raw_traj)`
	- Plot original vs reconstructed trajectories for visual comparison


	## Uses

	### Intended Use Cases

	- Robot Imitation Learning: Compress continuous demonstration trajectories into discrete tokens for language model-based policy learning
	- Trajectory Compression: Reduce memory footprint of robot demonstration datasets while preserving motion quality
	- Action Tokenization: Enable transformer-based models to process robot actions as discrete token sequences



	## Citation

	If you use BEAST in your research, please cite:

	BibTeX:

	```bibtex
	@inproceedings{
	zhou2025beast,
	title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
	author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov},
	booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
	year={2025},
	url={https://openreview.net/forum?id=rQCl1sf62w}
	}
	```