best / README.md

Upload folder using huggingface_hub

59073a4 verified 25 days ago

9.37 kB

	# BEST: B-Spline Encoded Sequence Tokenizer with Adaptive Knots

	BEST is an advanced action tokenizer that converts continuous robot action sequences into discrete tokens using adaptive B-splines with MILP-based knot optimization. It extends [BEAST](https://huggingface.co/zhouhongyi/beast) with forced gripper knots and adaptive compression for enhanced trajectory representation in imitation learning.

	## Installation

	Install the required dependencies:

	```bash
	pip install torch numpy scipy matplotlib tqdm pulp transformers
	```

	Note: CUDA is recommended for optimal performance, but CPU is also supported by setting `device="cpu"`.

	## Quick Start

	```python
	from transformers import AutoProcessor
	import torch

	# Initialize the BEST processor with configuration parameters:
	# - num_dof: degrees of freedom (7 for 6D robot arm + gripper)
	# - in_seq_len: input trajectory length (number of time steps)
	# - out_seq_len: output token sequence length after compression
	# - vocab_size: discrete vocabulary size (256 = 8-bit tokens)
	# - degree: degree of the B-spline polynomial (3 = cubic spline)
	# - gripper_dof: number of gripper DOFs (1 for binary gripper)
	# - device: computation device ('cpu' or 'cuda')
	best = AutoProcessor.from_pretrained(
	"Luka-He/best",
	trust_remote_code=True,
	num_dof=7,
	in_seq_len=50,
	out_seq_len=50,
	vocab_size=256,
	degree=3,
	gripper_dof=1,
	device='cuda'
	)

	# Create random trajectory data: 10 trajectories, each with 50 time steps, 7 dimensions
	trajectories = torch.randn(10, 50, 7)

	# Encode trajectories into discrete tokens
	# update_bounds=True allows the processor to adaptively update quantization bounds
	tokens = best.encode_discrete(trajectories, update_bounds=True)
	print(f"Encoded tokens shape: {tokens.shape}") # [10, 350] (50 * 7)

	# Decode tokens back to continuous trajectories
	reconstructed_trajectories = best.decode_discrete(tokens)
	print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}") # [10, 50, 7]

	# Calculate mean squared error to measure reconstruction quality
	mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2)
	print(f"MSE Loss: {mse_loss.item()}")
	```

	### Continuous Encoding

	For integration with continuous generative models:

	```python
	# Encode to normalized continuous parameters [-1, 1]
	params = best.encode_continuous(trajectories, update_bounds=True)
	print(f"Continuous params shape: {params.shape}") # [10, 350]

	# Decode back
	reconstructed = best.decode_continuous(params)
	print(f"Reconstructed shape: {reconstructed.shape}") # [10, 50, 7]
	```

	## Parameters

	\| Parameter \| Description \| Default \|
	\|-----------\|-------------\|---------\|
	\| num_dof \| Total degrees of freedom (robot joints + gripper) \| 7 \|
	\| in_seq_len \| Input trajectory sequence length (number of timesteps) \| 10 \|
	\| out_seq_len \| Output compressed sequence length (must ≥ control points after compression) \| 5 \|
	\| vocab_size \| Discrete vocabulary size (256 = 8-bit tokens) \| 256 \|
	\| degree \| B-spline polynomial degree (3=cubic, provides smooth trajectories) \| 3 \|
	\| gripper_dof \| Number of gripper DOFs, assumed to be at the end. Used for forced knot placement \| 1 \|
	\| do_pad \| Whether to pad control points to fixed length \| True \|
	\| device \| Torch device ("cuda" or "cpu") \| "cuda" \|

	### Token Count

	The total number of tokens per trajectory is: `out_seq_len * (num_dof + 1)`

	The extra dimension is for time knots. For example, with default settings (50 output length, 7 DOF): 400 tokens per trajectory (50 × 8).

	Key Difference from BEAST: BEST uses adaptive compression where `out_seq_len` can vary based on trajectory complexity, while BEAST uses fixed `num_basis` control points.

	## API Reference

	### Encoding Methods

	`encode_discrete(trajs, update_bounds=True)`

	- Input: Trajectories tensor `[batch, in_seq_len, num_dof]`
	- Output: Discrete tokens `[batch, out_seq_len * (num_dof + 1)]` in range `[0, vocab_size-1]`
	- `update_bounds`: Whether to update internal weight bounds from this batch

	`encode_continuous(trajs, update_bounds=True)`

	- Input: Trajectories tensor `[batch, in_seq_len, num_dof]`
	- Output: Normalized parameters `[batch, out_seq_len * (num_dof + 1)]` in range `[-1, 1]`

	### Decoding Methods

	`decode_discrete(tokens, target_length=None)`

	- Input: Discrete tokens `[batch, out_seq_len * (num_dof + 1)]`
	- Output: Reconstructed trajectories `[batch, target_length, num_dof]`
	- `target_length`: Output trajectory length (optional, defaults to `in_seq_len`)

	`decode_continuous(params, target_length=None)`

	- Input: Normalized parameters `[batch, out_seq_len * (num_dof + 1)]`
	- Output: Reconstructed trajectories `[batch, target_length, num_dof]`

	### Utility Methods

	`update_weights_bounds_per_batch(batch_weights)`

	- Update the min/max bounds used for normalization based on new batch data

	## Key Features

	### Adaptive Knot Selection with MILP

	Unlike BEAST's uniform knot spacing, BEST uses Mixed-Integer Linear Programming (MILP) to optimize knot placement:

	- Gripper-Driven Knots: Automatically places knots at gripper state transitions
	- Curvature-Based Optimization: Adds knots where trajectory curvature is high
	- Tolerance Control: Balances compression ratio vs. reconstruction accuracy

	### Forced Gripper Knots

	Preserves discrete gripper states by:
	- Detecting gripper state changes in input trajectory
	- Forcing B-spline knots at transition points
	- Using degree-0 splines for gripper DOF (piecewise constant)

	### Performance Benchmarks (LIBERO Dataset, 100 samples)

	\| Action Chunk Size \| Avg Time (s) \| CP Min \| CP Mean \| CP Max \| W_min \| W_max \| Success Rate \|
	\|-------------------\|--------------\|--------\|---------\|--------\|-------\|-------\|--------------\|
	\| 5 steps \| 0.104 \| 5 \| 5.0 \| 5 \| -1.0000 \| 5.0000 \| 100% \|
	\| 10 steps \| 0.211 \| 10 \| 10.0 \| 10 \| -1.0000 \| 10.0000 \| 100% \|
	\| 15 steps \| 0.427 \| 15 \| 15.0 \| 15 \| -1.3730 \| 15.0000 \| 100% \|
	\| 20 steps \| 0.696 \| 20 \| 20.0 \| 20 \| -1.0000 \| 20.0000 \| 100% \|
	\| 25 steps \| 1.904 \| 25 \| 25.0 \| 25 \| -1.0000 \| 25.0000 \| 100% \|
	\| 30 steps \| 3.217 \| 30 \| 30.0 \| 30 \| -1.0000 \| 30.0000 \| 100% \|
	\| 35 steps \| 5.372 \| 35 \| 35.0 \| 35 \| -1.0000 \| 35.0000 \| 93% \|

	Note: CP (Control Points) length represents the number of knots selected by the adaptive algorithm.

	## Comparison with BEAST

	\| Feature \| BEAST \| BEST \|
	\|---------\|-------\|------\|
	\| Knot Selection \| Uniform spacing \| Adaptive (MILP-based) \|
	\| Gripper Handling \| Optional zero-order \| Forced knots at transitions \|
	\| Compression \| Fixed basis count \| Adaptive based on complexity \|
	\| Encoding Time \| ~20ms (50 steps) \| ~100ms (5 steps) to ~5s (35 steps) \|
	\| Trajectory Fidelity \| High (uniform) \| Very high (adaptive) \|
	\| Use Case \| General trajectories \| Robot manipulation with gripper \|

	## Uses

	### Intended Use Cases

	- Robot Imitation Learning: Compress continuous demonstration trajectories with gripper states into discrete tokens for VLA-based policy learning
	- Manipulation Dataset Compression: Reduce memory footprint while preserving both motion quality and discrete gripper transitions
	- VLA Action Tokenization: Enable vision-language-action models to process robot actions as discrete token sequences with explicit gripper control

	### Out-of-Scope Use Cases

	- Trajectories without discrete state transitions (use BEAST instead for better speed)
	- Real-time control (MILP optimization adds computational overhead)
	- Non-robotic continuous signals (optimized for manipulation trajectories)

	## Advanced Usage

	### Custom Configuration

	```python
	from online_bspline_tokenizer import BestTokenizer
	import torch

	# Create with custom parameters
	tokenizer = BestTokenizer(
	num_dof=7,
	in_seq_len=100, # Longer input trajectories
	out_seq_len=100, # Allow up to 100 control points
	vocab_size=512, # Higher resolution quantization
	degree=3,
	gripper_dof=1,
	do_pad=True,
	device='cuda'
	)

	# Process trajectories
	trajectories = torch.randn(5, 100, 7)
	tokens = tokenizer.encode_discrete(trajectories)
	```

	### Saving and Loading

	```python
	# Save processor configuration
	tokenizer.save_pretrained("./my_best_tokenizer")

	# Load later
	from transformers import AutoProcessor
	loaded_tokenizer = AutoProcessor.from_pretrained(
	"./my_best_tokenizer",
	trust_remote_code=True
	)
	```

	## Citation

	If you use BEST in your research, please cite:

	```bibtex
	@misc{best2026,
	title={BEST: B-Spline Encoded Sequence Tokenizer with Adaptive Knots},
	author={Hexinyu},
	year={2026},
	url={https://github.com/your-repo/best}
	}
	```

	Based on BEAST:
	```bibtex
	@inproceedings{
	zhou2025beast,
	title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
	author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov},
	booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
	year={2025},
	url={https://openreview.net/forum?id=rQCl1sf62w}
	}
	```

	## License

	MIT License

	## Acknowledgments

	This work builds upon [BEAST](https://huggingface.co/zhouhongyi/beast) and extends it with adaptive knot selection for improved manipulation trajectory encoding.