File size: 10,364 Bytes

59073a4
6c823ef
59073a4
6c823ef
a74cf3e
6c823ef
59073a4
6c823ef
59073a4
 
a74cf3e
6c823ef
59073a4
6c823ef
59073a4
6c823ef
a74cf3e
59073a4
a74cf3e
59073a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a74cf3e
 
6c823ef
59073a4
 
 
 
 
 
 
a74cf3e
59073a4
 
 
6c823ef
59073a4
 
 
a74cf3e
6c823ef
59073a4
 
 
6c823ef
a74cf3e
59073a4
 
 
a74cf3e
59073a4
 
 
 
6c823ef
59073a4
a74cf3e
59073a4
 
 
 
 
 
 
 
1fb4ef8
59073a4
 
a74cf3e
59073a4
a74cf3e
59073a4
a74cf3e
59073a4
a74cf3e
59073a4
a74cf3e
1fb4ef8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59073a4
a74cf3e
59073a4
a74cf3e
59073a4
a74cf3e
59073a4
 
 
a74cf3e
59073a4
a74cf3e
59073a4
 
a74cf3e
59073a4
a74cf3e
59073a4
a74cf3e
59073a4
 
 
a74cf3e
59073a4
a74cf3e
59073a4
 
a74cf3e
59073a4
a74cf3e
59073a4
a74cf3e
59073a4
a74cf3e
59073a4
 
 
 
 
 
 
 
 
a74cf3e
59073a4
a74cf3e
59073a4
 
 
 
a74cf3e
59073a4
a74cf3e
59073a4
 
 
 
 
 
 
 
 
a74cf3e
59073a4
a74cf3e
 
 
59073a4
 
 
 
 
 
 
 
a74cf3e
59073a4
a74cf3e
59073a4
 
 
 
 
 
 
 
 
 
 
a74cf3e
59073a4
a74cf3e
59073a4
a74cf3e
59073a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a74cf3e
59073a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a74cf3e
 
 
59073a4
a74cf3e
 
59073a4
 
 
 
 
a74cf3e
 
 
 
 
59073a4
 
 
 
 
 
 
a74cf3e
 
 
 
 
 
 
59073a4
a74cf3e
59073a4

# BEST: B-Spline Encoded Sequence Tokenizer with Adaptive Knots

BEST is an advanced action tokenizer that converts continuous robot action sequences into discrete tokens using adaptive B-splines with MILP-based knot optimization. It extends [BEAST](https://huggingface.co/zhouhongyi/beast) with forced gripper knots and adaptive compression for enhanced trajectory representation in imitation learning.

## Installation

Install the required dependencies:

```bash
pip install torch numpy scipy matplotlib tqdm pulp transformers
```

Note: CUDA is recommended for optimal performance, but CPU is also supported by setting `device="cpu"`.

## Quick Start

```python
from transformers import AutoProcessor
import torch

# Initialize the BEST processor with configuration parameters:
# - num_dof: degrees of freedom (7 for 6D robot arm + gripper)
# - in_seq_len: input trajectory length (number of time steps)
# - out_seq_len: output token sequence length after compression
# - vocab_size: discrete vocabulary size (256 = 8-bit tokens)
# - degree: degree of the B-spline polynomial (3 = cubic spline)
# - gripper_dof: number of gripper DOFs (1 for binary gripper)
# - device: computation device ('cpu' or 'cuda')
best = AutoProcessor.from_pretrained(
    "Luka-He/best",
    trust_remote_code=True,
    num_dof=7,
    in_seq_len=50,
    out_seq_len=50,
    vocab_size=256,
    degree=3,
    gripper_dof=1,
    device='cuda'
)

# Create random trajectory data: 10 trajectories, each with 50 time steps, 7 dimensions
trajectories = torch.randn(10, 50, 7)

# Encode trajectories into discrete tokens
# update_bounds=True allows the processor to adaptively update quantization bounds
tokens = best.encode_discrete(trajectories, update_bounds=True)
print(f"Encoded tokens shape: {tokens.shape}")  # [10, 350] (50 * 7)

# Decode tokens back to continuous trajectories
reconstructed_trajectories = best.decode_discrete(tokens)
print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}")  # [10, 50, 7]

# Calculate mean squared error to measure reconstruction quality
mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2)
print(f"MSE Loss: {mse_loss.item()}")
```

### Continuous Encoding

For integration with continuous generative models:

```python
# Encode to normalized continuous parameters [-1, 1]
params = best.encode_continuous(trajectories, update_bounds=True)
print(f"Continuous params shape: {params.shape}")  # [10, 350]

# Decode back
reconstructed = best.decode_continuous(params)
print(f"Reconstructed shape: {reconstructed.shape}")  # [10, 50, 7]
```

## Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| num_dof | Total degrees of freedom (robot joints + gripper) | 7 |
| in_seq_len | Input trajectory sequence length (number of timesteps) | 10 |
| out_seq_len | Output compressed sequence length (must ≥ control points after compression) | 5 |
| vocab_size | Discrete vocabulary size (256 = 8-bit tokens) | 256 |
| degree | B-spline polynomial degree (3=cubic, provides smooth trajectories) | 3 |
| gripper_dof | Number of gripper DOFs, assumed to be at the end. Used for forced knot placement | 1 |
| absolute_tol | Absolute tolerance for B-spline fitting error (e.g., 0.01 radians). Controls fitting accuracy and compression ratio. If set, overrides relative tolerance. | 0.01 |
| do_pad | Whether to pad control points to fixed length | True |
| device | Torch device ("cuda" or "cpu") | "cuda" |

### Token Count

The total number of tokens per trajectory is: `out_seq_len * (num_dof + 1)`

The extra dimension is for time knots. For example, with default settings (50 output length, 7 DOF): 400 tokens per trajectory (50 × 8).

**Key Difference from BEAST**: BEST uses adaptive compression where `out_seq_len` can vary based on trajectory complexity, while BEAST uses fixed `num_basis` control points.

### Absolute Tolerance (absolute_tol)

The `absolute_tol` parameter controls the fitting accuracy of the B-spline approximation:

- **Definition**: Maximum allowed L∞ error between the B-spline reconstruction and the original trajectory
- **Default**: 0.01 (appropriate for LIBERO tasks in radians)
- **Effect on Compression**:
  - Lower values (e.g., 0.001): Tighter fitting, more control points needed, larger token count
  - Higher values (e.g., 0.1): Looser fitting, fewer control points, smaller token count
- **Recommendation**:
  - LIBERO manipulation: 0.01 (default)
  - High-precision tasks: 0.001-0.005
  - General arm motion: 0.01-0.05
  - Speed-optimized: 0.05-0.1

**Priority**: When both `absolute_tol` and `tol_ratio` are applicable, `absolute_tol` takes precedence for fitting error threshold.

## API Reference

### Encoding Methods

`encode_discrete(trajs, update_bounds=True)`

- Input: Trajectories tensor `[batch, in_seq_len, num_dof]`
- Output: Discrete tokens `[batch, out_seq_len * (num_dof + 1)]` in range `[0, vocab_size-1]`
- `update_bounds`: Whether to update internal weight bounds from this batch

`encode_continuous(trajs, update_bounds=True)`

- Input: Trajectories tensor `[batch, in_seq_len, num_dof]`
- Output: Normalized parameters `[batch, out_seq_len * (num_dof + 1)]` in range `[-1, 1]`

### Decoding Methods

`decode_discrete(tokens, target_length=None)`

- Input: Discrete tokens `[batch, out_seq_len * (num_dof + 1)]`
- Output: Reconstructed trajectories `[batch, target_length, num_dof]`
- `target_length`: Output trajectory length (optional, defaults to `in_seq_len`)

`decode_continuous(params, target_length=None)`

- Input: Normalized parameters `[batch, out_seq_len * (num_dof + 1)]`
- Output: Reconstructed trajectories `[batch, target_length, num_dof]`

### Utility Methods

`update_weights_bounds_per_batch(batch_weights)`

- Update the min/max bounds used for normalization based on new batch data

## Key Features

### Adaptive Knot Selection with MILP

Unlike BEAST's uniform knot spacing, BEST uses Mixed-Integer Linear Programming (MILP) to optimize knot placement:

- **Gripper-Driven Knots**: Automatically places knots at gripper state transitions
- **Curvature-Based Optimization**: Adds knots where trajectory curvature is high
- **Tolerance Control**: Balances compression ratio vs. reconstruction accuracy

### Forced Gripper Knots

Preserves discrete gripper states by:
- Detecting gripper state changes in input trajectory
- Forcing B-spline knots at transition points
- Using degree-0 splines for gripper DOF (piecewise constant)

### Performance Benchmarks (LIBERO Dataset, 100 samples)

| Action Chunk Size | Avg Time (s) | CP Min | CP Mean | CP Max | W_min | W_max | Success Rate |
|-------------------|--------------|--------|---------|--------|-------|-------|--------------|
| 5 steps | 0.104 | 5 | 5.0 | 5 | -1.0000 | 5.0000 | 100% |
| 10 steps | 0.211 | 10 | 10.0 | 10 | -1.0000 | 10.0000 | 100% |
| 15 steps | 0.427 | 15 | 15.0 | 15 | -1.3730 | 15.0000 | 100% |
| 20 steps | 0.696 | 20 | 20.0 | 20 | -1.0000 | 20.0000 | 100% |
| 25 steps | 1.904 | 25 | 25.0 | 25 | -1.0000 | 25.0000 | 100% |
| 30 steps | 3.217 | 30 | 30.0 | 30 | -1.0000 | 30.0000 | 100% |
| 35 steps | 5.372 | 35 | 35.0 | 35 | -1.0000 | 35.0000 | 93% |

Note: CP (Control Points) length represents the number of knots selected by the adaptive algorithm.

## Comparison with BEAST

| Feature | BEAST | BEST |
|---------|-------|------|
| Knot Selection | Uniform spacing | Adaptive (MILP-based) |
| Gripper Handling | Optional zero-order | Forced knots at transitions |
| Compression | Fixed basis count | Adaptive based on complexity |
| Encoding Time | ~20ms (50 steps) | ~100ms (5 steps) to ~5s (35 steps) |
| Trajectory Fidelity | High (uniform) | Very high (adaptive) |
| Use Case | General trajectories | Robot manipulation with gripper |

## Uses

### Intended Use Cases

- **Robot Imitation Learning**: Compress continuous demonstration trajectories with gripper states into discrete tokens for VLA-based policy learning
- **Manipulation Dataset Compression**: Reduce memory footprint while preserving both motion quality and discrete gripper transitions
- **VLA Action Tokenization**: Enable vision-language-action models to process robot actions as discrete token sequences with explicit gripper control

### Out-of-Scope Use Cases

- Trajectories without discrete state transitions (use BEAST instead for better speed)
- Real-time control (MILP optimization adds computational overhead)
- Non-robotic continuous signals (optimized for manipulation trajectories)

## Advanced Usage

### Custom Configuration

```python
from online_bspline_tokenizer import BestTokenizer
import torch

# Create with custom parameters
tokenizer = BestTokenizer(
    num_dof=7,
    in_seq_len=100,      # Longer input trajectories
    out_seq_len=100,     # Allow up to 100 control points
    vocab_size=512,      # Higher resolution quantization
    degree=3,
    gripper_dof=1,
    do_pad=True,
    device='cuda'
)

# Process trajectories
trajectories = torch.randn(5, 100, 7)
tokens = tokenizer.encode_discrete(trajectories)
```

### Saving and Loading

```python
# Save processor configuration
tokenizer.save_pretrained("./my_best_tokenizer")

# Load later
from transformers import AutoProcessor
loaded_tokenizer = AutoProcessor.from_pretrained(
    "./my_best_tokenizer",
    trust_remote_code=True
)
```

## Citation

If you use BEST in your research, please cite:

```bibtex
@misc{best2026,
  title={BEST: B-Spline Encoded Sequence Tokenizer with Adaptive Knots},
  author={Hexinyu},
  year={2026},
  url={https://github.com/your-repo/best}
}
```

Based on BEAST:
```bibtex
@inproceedings{
zhou2025beast,
title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=rQCl1sf62w}
}
```

## License

MIT License

## Acknowledgments

This work builds upon [BEAST](https://huggingface.co/zhouhongyi/beast) and extends it with adaptive knot selection for improved manipulation trajectory encoding.