BEST: B-Spline Encoded Sequence Tokenizer with Adaptive Knots
BEST is an advanced action tokenizer that converts continuous robot action sequences into discrete tokens using adaptive B-splines with MILP-based knot optimization. It extends BEAST with forced gripper knots and adaptive compression for enhanced trajectory representation in imitation learning.
Installation
Install the required dependencies:
pip install torch numpy scipy matplotlib tqdm pulp transformers
Note: CUDA is recommended for optimal performance, but CPU is also supported by setting device="cpu".
Quick Start
from transformers import AutoProcessor
import torch
# Initialize the BEST processor with configuration parameters:
# - num_dof: degrees of freedom (7 for 6D robot arm + gripper)
# - in_seq_len: input trajectory length (number of time steps)
# - out_seq_len: output token sequence length after compression
# - vocab_size: discrete vocabulary size (256 = 8-bit tokens)
# - degree: degree of the B-spline polynomial (3 = cubic spline)
# - gripper_dof: number of gripper DOFs (1 for binary gripper)
# - device: computation device ('cpu' or 'cuda')
best = AutoProcessor.from_pretrained(
"Luka-He/best",
trust_remote_code=True,
num_dof=7,
in_seq_len=50,
out_seq_len=50,
vocab_size=256,
degree=3,
gripper_dof=1,
device='cuda'
)
# Create random trajectory data: 10 trajectories, each with 50 time steps, 7 dimensions
trajectories = torch.randn(10, 50, 7)
# Encode trajectories into discrete tokens
# update_bounds=True allows the processor to adaptively update quantization bounds
tokens = best.encode_discrete(trajectories, update_bounds=True)
print(f"Encoded tokens shape: {tokens.shape}") # [10, 350] (50 * 7)
# Decode tokens back to continuous trajectories
reconstructed_trajectories = best.decode_discrete(tokens)
print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}") # [10, 50, 7]
# Calculate mean squared error to measure reconstruction quality
mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2)
print(f"MSE Loss: {mse_loss.item()}")
Continuous Encoding
For integration with continuous generative models:
# Encode to normalized continuous parameters [-1, 1]
params = best.encode_continuous(trajectories, update_bounds=True)
print(f"Continuous params shape: {params.shape}") # [10, 350]
# Decode back
reconstructed = best.decode_continuous(params)
print(f"Reconstructed shape: {reconstructed.shape}") # [10, 50, 7]
Parameters
| Parameter | Description | Default |
|---|---|---|
| num_dof | Total degrees of freedom (robot joints + gripper) | 7 |
| in_seq_len | Input trajectory sequence length (number of timesteps) | 10 |
| out_seq_len | Output compressed sequence length (must ≥ control points after compression) | 5 |
| vocab_size | Discrete vocabulary size (256 = 8-bit tokens) | 256 |
| degree | B-spline polynomial degree (3=cubic, provides smooth trajectories) | 3 |
| gripper_dof | Number of gripper DOFs, assumed to be at the end. Used for forced knot placement | 1 |
| do_pad | Whether to pad control points to fixed length | True |
| device | Torch device ("cuda" or "cpu") | "cuda" |
Token Count
The total number of tokens per trajectory is: out_seq_len * (num_dof + 1)
The extra dimension is for time knots. For example, with default settings (50 output length, 7 DOF): 400 tokens per trajectory (50 × 8).
Key Difference from BEAST: BEST uses adaptive compression where out_seq_len can vary based on trajectory complexity, while BEAST uses fixed num_basis control points.
API Reference
Encoding Methods
encode_discrete(trajs, update_bounds=True)
- Input: Trajectories tensor
[batch, in_seq_len, num_dof] - Output: Discrete tokens
[batch, out_seq_len * (num_dof + 1)]in range[0, vocab_size-1] update_bounds: Whether to update internal weight bounds from this batch
encode_continuous(trajs, update_bounds=True)
- Input: Trajectories tensor
[batch, in_seq_len, num_dof] - Output: Normalized parameters
[batch, out_seq_len * (num_dof + 1)]in range[-1, 1]
Decoding Methods
decode_discrete(tokens, target_length=None)
- Input: Discrete tokens
[batch, out_seq_len * (num_dof + 1)] - Output: Reconstructed trajectories
[batch, target_length, num_dof] target_length: Output trajectory length (optional, defaults toin_seq_len)
decode_continuous(params, target_length=None)
- Input: Normalized parameters
[batch, out_seq_len * (num_dof + 1)] - Output: Reconstructed trajectories
[batch, target_length, num_dof]
Utility Methods
update_weights_bounds_per_batch(batch_weights)
- Update the min/max bounds used for normalization based on new batch data
Key Features
Adaptive Knot Selection with MILP
Unlike BEAST's uniform knot spacing, BEST uses Mixed-Integer Linear Programming (MILP) to optimize knot placement:
- Gripper-Driven Knots: Automatically places knots at gripper state transitions
- Curvature-Based Optimization: Adds knots where trajectory curvature is high
- Tolerance Control: Balances compression ratio vs. reconstruction accuracy
Forced Gripper Knots
Preserves discrete gripper states by:
- Detecting gripper state changes in input trajectory
- Forcing B-spline knots at transition points
- Using degree-0 splines for gripper DOF (piecewise constant)
Performance Benchmarks (LIBERO Dataset, 100 samples)
| Action Chunk Size | Avg Time (s) | CP Min | CP Mean | CP Max | W_min | W_max | Success Rate |
|---|---|---|---|---|---|---|---|
| 5 steps | 0.104 | 5 | 5.0 | 5 | -1.0000 | 5.0000 | 100% |
| 10 steps | 0.211 | 10 | 10.0 | 10 | -1.0000 | 10.0000 | 100% |
| 15 steps | 0.427 | 15 | 15.0 | 15 | -1.3730 | 15.0000 | 100% |
| 20 steps | 0.696 | 20 | 20.0 | 20 | -1.0000 | 20.0000 | 100% |
| 25 steps | 1.904 | 25 | 25.0 | 25 | -1.0000 | 25.0000 | 100% |
| 30 steps | 3.217 | 30 | 30.0 | 30 | -1.0000 | 30.0000 | 100% |
| 35 steps | 5.372 | 35 | 35.0 | 35 | -1.0000 | 35.0000 | 93% |
Note: CP (Control Points) length represents the number of knots selected by the adaptive algorithm.
Comparison with BEAST
| Feature | BEAST | BEST |
|---|---|---|
| Knot Selection | Uniform spacing | Adaptive (MILP-based) |
| Gripper Handling | Optional zero-order | Forced knots at transitions |
| Compression | Fixed basis count | Adaptive based on complexity |
| Encoding Time | ~20ms (50 steps) | ~100ms (5 steps) to ~5s (35 steps) |
| Trajectory Fidelity | High (uniform) | Very high (adaptive) |
| Use Case | General trajectories | Robot manipulation with gripper |
Uses
Intended Use Cases
- Robot Imitation Learning: Compress continuous demonstration trajectories with gripper states into discrete tokens for VLA-based policy learning
- Manipulation Dataset Compression: Reduce memory footprint while preserving both motion quality and discrete gripper transitions
- VLA Action Tokenization: Enable vision-language-action models to process robot actions as discrete token sequences with explicit gripper control
Out-of-Scope Use Cases
- Trajectories without discrete state transitions (use BEAST instead for better speed)
- Real-time control (MILP optimization adds computational overhead)
- Non-robotic continuous signals (optimized for manipulation trajectories)
Advanced Usage
Custom Configuration
from online_bspline_tokenizer import BestTokenizer
import torch
# Create with custom parameters
tokenizer = BestTokenizer(
num_dof=7,
in_seq_len=100, # Longer input trajectories
out_seq_len=100, # Allow up to 100 control points
vocab_size=512, # Higher resolution quantization
degree=3,
gripper_dof=1,
do_pad=True,
device='cuda'
)
# Process trajectories
trajectories = torch.randn(5, 100, 7)
tokens = tokenizer.encode_discrete(trajectories)
Saving and Loading
# Save processor configuration
tokenizer.save_pretrained("./my_best_tokenizer")
# Load later
from transformers import AutoProcessor
loaded_tokenizer = AutoProcessor.from_pretrained(
"./my_best_tokenizer",
trust_remote_code=True
)
Citation
If you use BEST in your research, please cite:
@misc{best2026,
title={BEST: B-Spline Encoded Sequence Tokenizer with Adaptive Knots},
author={Hexinyu},
year={2026},
url={https://github.com/your-repo/best}
}
Based on BEAST:
@inproceedings{
zhou2025beast,
title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=rQCl1sf62w}
}
License
MIT License
Acknowledgments
This work builds upon BEAST and extends it with adaptive knot selection for improved manipulation trajectory encoding.