YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

BEST: B-Spline Encoded Sequence Tokenizer with Adaptive Knots

BEST is an advanced action tokenizer that converts continuous robot action sequences into discrete tokens using adaptive B-splines with MILP-based knot optimization. It extends BEAST with forced gripper knots and adaptive compression for enhanced trajectory representation in imitation learning.

Installation

Install the required dependencies:

pip install torch numpy scipy matplotlib tqdm pulp transformers

Note: CUDA is recommended for optimal performance, but CPU is also supported by setting device="cpu".

Quick Start

from transformers import AutoProcessor
import torch

# Initialize the BEST processor with configuration parameters:
# - num_dof: degrees of freedom (7 for 6D robot arm + gripper)
# - in_seq_len: input trajectory length (number of time steps)
# - out_seq_len: output token sequence length after compression
# - vocab_size: discrete vocabulary size (256 = 8-bit tokens)
# - degree: degree of the B-spline polynomial (3 = cubic spline)
# - gripper_dof: number of gripper DOFs (1 for binary gripper)
# - device: computation device ('cpu' or 'cuda')
best = AutoProcessor.from_pretrained(
    "Luka-He/best",
    trust_remote_code=True,
    num_dof=7,
    in_seq_len=50,
    out_seq_len=50,
    vocab_size=256,
    degree=3,
    gripper_dof=1,
    device='cuda'
)

# Create random trajectory data: 10 trajectories, each with 50 time steps, 7 dimensions
trajectories = torch.randn(10, 50, 7)

# Encode trajectories into discrete tokens
# update_bounds=True allows the processor to adaptively update quantization bounds
tokens = best.encode_discrete(trajectories, update_bounds=True)
print(f"Encoded tokens shape: {tokens.shape}")  # [10, 350] (50 * 7)

# Decode tokens back to continuous trajectories
reconstructed_trajectories = best.decode_discrete(tokens)
print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}")  # [10, 50, 7]

# Calculate mean squared error to measure reconstruction quality
mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2)
print(f"MSE Loss: {mse_loss.item()}")

Continuous Encoding

For integration with continuous generative models:

# Encode to normalized continuous parameters [-1, 1]
params = best.encode_continuous(trajectories, update_bounds=True)
print(f"Continuous params shape: {params.shape}")  # [10, 350]

# Decode back
reconstructed = best.decode_continuous(params)
print(f"Reconstructed shape: {reconstructed.shape}")  # [10, 50, 7]

Parameters

Parameter	Description	Default
num_dof	Total degrees of freedom (robot joints + gripper)	7
in_seq_len	Input trajectory sequence length (number of timesteps)	10
out_seq_len	Output compressed sequence length (must ≥ control points after compression)	5
vocab_size	Discrete vocabulary size (256 = 8-bit tokens)	256
degree	B-spline polynomial degree (3=cubic, provides smooth trajectories)	3
gripper_dof	Number of gripper DOFs, assumed to be at the end. Used for forced knot placement	1
absolute_tol	Absolute tolerance for B-spline fitting error (e.g., 0.01 radians). Controls fitting accuracy and compression ratio. If set, overrides relative tolerance.	0.01
do_pad	Whether to pad control points to fixed length	True
device	Torch device ("cuda" or "cpu")	"cuda"

Token Count

The total number of tokens per trajectory is: out_seq_len * (num_dof + 1)

The extra dimension is for time knots. For example, with default settings (50 output length, 7 DOF): 400 tokens per trajectory (50 × 8).

Key Difference from BEAST: BEST uses adaptive compression where out_seq_len can vary based on trajectory complexity, while BEAST uses fixed num_basis control points.

Absolute Tolerance (absolute_tol)

The absolute_tol parameter controls the fitting accuracy of the B-spline approximation:

Definition: Maximum allowed L∞ error between the B-spline reconstruction and the original trajectory
Default: 0.01 (appropriate for LIBERO tasks in radians)
Effect on Compression:
- Lower values (e.g., 0.001): Tighter fitting, more control points needed, larger token count
- Higher values (e.g., 0.1): Looser fitting, fewer control points, smaller token count
Recommendation:
- LIBERO manipulation: 0.01 (default)
- High-precision tasks: 0.001-0.005
- General arm motion: 0.01-0.05
- Speed-optimized: 0.05-0.1

Priority: When both absolute_tol and tol_ratio are applicable, absolute_tol takes precedence for fitting error threshold.

API Reference

Encoding Methods

encode_discrete(trajs, update_bounds=True)

Input: Trajectories tensor [batch, in_seq_len, num_dof]
Output: Discrete tokens [batch, out_seq_len * (num_dof + 1)] in range [0, vocab_size-1]
update_bounds: Whether to update internal weight bounds from this batch

encode_continuous(trajs, update_bounds=True)

Input: Trajectories tensor [batch, in_seq_len, num_dof]
Output: Normalized parameters [batch, out_seq_len * (num_dof + 1)] in range [-1, 1]

Decoding Methods

decode_discrete(tokens, target_length=None)

Input: Discrete tokens [batch, out_seq_len * (num_dof + 1)]
Output: Reconstructed trajectories [batch, target_length, num_dof]
target_length: Output trajectory length (optional, defaults to in_seq_len)

decode_continuous(params, target_length=None)

Input: Normalized parameters [batch, out_seq_len * (num_dof + 1)]
Output: Reconstructed trajectories [batch, target_length, num_dof]

Utility Methods

update_weights_bounds_per_batch(batch_weights)

Update the min/max bounds used for normalization based on new batch data

Key Features

Adaptive Knot Selection with MILP

Unlike BEAST's uniform knot spacing, BEST uses Mixed-Integer Linear Programming (MILP) to optimize knot placement:

Gripper-Driven Knots: Automatically places knots at gripper state transitions
Curvature-Based Optimization: Adds knots where trajectory curvature is high
Tolerance Control: Balances compression ratio vs. reconstruction accuracy

Forced Gripper Knots

Preserves discrete gripper states by:

Detecting gripper state changes in input trajectory
Forcing B-spline knots at transition points
Using degree-0 splines for gripper DOF (piecewise constant)

Performance Benchmarks (LIBERO Dataset, 100 samples)

Action Chunk Size	Avg Time (s)	CP Min	CP Mean	CP Max	W_min	W_max	Success Rate
5 steps	0.104	5	5.0	5	-1.0000	5.0000	100%
10 steps	0.211	10	10.0	10	-1.0000	10.0000	100%
15 steps	0.427	15	15.0	15	-1.3730	15.0000	100%
20 steps	0.696	20	20.0	20	-1.0000	20.0000	100%
25 steps	1.904	25	25.0	25	-1.0000	25.0000	100%
30 steps	3.217	30	30.0	30	-1.0000	30.0000	100%
35 steps	5.372	35	35.0	35	-1.0000	35.0000	93%

Note: CP (Control Points) length represents the number of knots selected by the adaptive algorithm.

Comparison with BEAST

Feature	BEAST	BEST
Knot Selection	Uniform spacing	Adaptive (MILP-based)
Gripper Handling	Optional zero-order	Forced knots at transitions
Compression	Fixed basis count	Adaptive based on complexity
Encoding Time	~20ms (50 steps)	~100ms (5 steps) to ~5s (35 steps)
Trajectory Fidelity	High (uniform)	Very high (adaptive)
Use Case	General trajectories	Robot manipulation with gripper

Uses

Intended Use Cases

Robot Imitation Learning: Compress continuous demonstration trajectories with gripper states into discrete tokens for VLA-based policy learning
Manipulation Dataset Compression: Reduce memory footprint while preserving both motion quality and discrete gripper transitions
VLA Action Tokenization: Enable vision-language-action models to process robot actions as discrete token sequences with explicit gripper control

Out-of-Scope Use Cases

Trajectories without discrete state transitions (use BEAST instead for better speed)
Real-time control (MILP optimization adds computational overhead)
Non-robotic continuous signals (optimized for manipulation trajectories)

Advanced Usage

Custom Configuration

from online_bspline_tokenizer import BestTokenizer
import torch

# Create with custom parameters
tokenizer = BestTokenizer(
    num_dof=7,
    in_seq_len=100,      # Longer input trajectories
    out_seq_len=100,     # Allow up to 100 control points
    vocab_size=512,      # Higher resolution quantization
    degree=3,
    gripper_dof=1,
    do_pad=True,
    device='cuda'
)

# Process trajectories
trajectories = torch.randn(5, 100, 7)
tokens = tokenizer.encode_discrete(trajectories)

Saving and Loading

# Save processor configuration
tokenizer.save_pretrained("./my_best_tokenizer")

# Load later
from transformers import AutoProcessor
loaded_tokenizer = AutoProcessor.from_pretrained(
    "./my_best_tokenizer",
    trust_remote_code=True
)

Citation

If you use BEST in your research, please cite:

@misc{best2026,
  title={BEST: B-Spline Encoded Sequence Tokenizer with Adaptive Knots},
  author={Hexinyu},
  year={2026},
  url={https://github.com/your-repo/best}
}

Based on BEAST:

@inproceedings{
zhou2025beast,
title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=rQCl1sf62w}
}

License

MIT License

Acknowledgments

This work builds upon BEAST and extends it with adaptive knot selection for improved manipulation trajectory encoding.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support