File size: 10,364 Bytes
59073a4 6c823ef 59073a4 6c823ef a74cf3e 6c823ef 59073a4 6c823ef 59073a4 a74cf3e 6c823ef 59073a4 6c823ef 59073a4 6c823ef a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 6c823ef 59073a4 a74cf3e 59073a4 6c823ef 59073a4 a74cf3e 6c823ef 59073a4 6c823ef a74cf3e 59073a4 a74cf3e 59073a4 6c823ef 59073a4 a74cf3e 59073a4 1fb4ef8 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 1fb4ef8 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 a74cf3e 59073a4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 | # BEST: B-Spline Encoded Sequence Tokenizer with Adaptive Knots
BEST is an advanced action tokenizer that converts continuous robot action sequences into discrete tokens using adaptive B-splines with MILP-based knot optimization. It extends [BEAST](https://huggingface.co/zhouhongyi/beast) with forced gripper knots and adaptive compression for enhanced trajectory representation in imitation learning.
## Installation
Install the required dependencies:
```bash
pip install torch numpy scipy matplotlib tqdm pulp transformers
```
Note: CUDA is recommended for optimal performance, but CPU is also supported by setting `device="cpu"`.
## Quick Start
```python
from transformers import AutoProcessor
import torch
# Initialize the BEST processor with configuration parameters:
# - num_dof: degrees of freedom (7 for 6D robot arm + gripper)
# - in_seq_len: input trajectory length (number of time steps)
# - out_seq_len: output token sequence length after compression
# - vocab_size: discrete vocabulary size (256 = 8-bit tokens)
# - degree: degree of the B-spline polynomial (3 = cubic spline)
# - gripper_dof: number of gripper DOFs (1 for binary gripper)
# - device: computation device ('cpu' or 'cuda')
best = AutoProcessor.from_pretrained(
"Luka-He/best",
trust_remote_code=True,
num_dof=7,
in_seq_len=50,
out_seq_len=50,
vocab_size=256,
degree=3,
gripper_dof=1,
device='cuda'
)
# Create random trajectory data: 10 trajectories, each with 50 time steps, 7 dimensions
trajectories = torch.randn(10, 50, 7)
# Encode trajectories into discrete tokens
# update_bounds=True allows the processor to adaptively update quantization bounds
tokens = best.encode_discrete(trajectories, update_bounds=True)
print(f"Encoded tokens shape: {tokens.shape}") # [10, 350] (50 * 7)
# Decode tokens back to continuous trajectories
reconstructed_trajectories = best.decode_discrete(tokens)
print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}") # [10, 50, 7]
# Calculate mean squared error to measure reconstruction quality
mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2)
print(f"MSE Loss: {mse_loss.item()}")
```
### Continuous Encoding
For integration with continuous generative models:
```python
# Encode to normalized continuous parameters [-1, 1]
params = best.encode_continuous(trajectories, update_bounds=True)
print(f"Continuous params shape: {params.shape}") # [10, 350]
# Decode back
reconstructed = best.decode_continuous(params)
print(f"Reconstructed shape: {reconstructed.shape}") # [10, 50, 7]
```
## Parameters
| Parameter | Description | Default |
|-----------|-------------|---------|
| num_dof | Total degrees of freedom (robot joints + gripper) | 7 |
| in_seq_len | Input trajectory sequence length (number of timesteps) | 10 |
| out_seq_len | Output compressed sequence length (must ≥ control points after compression) | 5 |
| vocab_size | Discrete vocabulary size (256 = 8-bit tokens) | 256 |
| degree | B-spline polynomial degree (3=cubic, provides smooth trajectories) | 3 |
| gripper_dof | Number of gripper DOFs, assumed to be at the end. Used for forced knot placement | 1 |
| absolute_tol | Absolute tolerance for B-spline fitting error (e.g., 0.01 radians). Controls fitting accuracy and compression ratio. If set, overrides relative tolerance. | 0.01 |
| do_pad | Whether to pad control points to fixed length | True |
| device | Torch device ("cuda" or "cpu") | "cuda" |
### Token Count
The total number of tokens per trajectory is: `out_seq_len * (num_dof + 1)`
The extra dimension is for time knots. For example, with default settings (50 output length, 7 DOF): 400 tokens per trajectory (50 × 8).
**Key Difference from BEAST**: BEST uses adaptive compression where `out_seq_len` can vary based on trajectory complexity, while BEAST uses fixed `num_basis` control points.
### Absolute Tolerance (absolute_tol)
The `absolute_tol` parameter controls the fitting accuracy of the B-spline approximation:
- **Definition**: Maximum allowed L∞ error between the B-spline reconstruction and the original trajectory
- **Default**: 0.01 (appropriate for LIBERO tasks in radians)
- **Effect on Compression**:
- Lower values (e.g., 0.001): Tighter fitting, more control points needed, larger token count
- Higher values (e.g., 0.1): Looser fitting, fewer control points, smaller token count
- **Recommendation**:
- LIBERO manipulation: 0.01 (default)
- High-precision tasks: 0.001-0.005
- General arm motion: 0.01-0.05
- Speed-optimized: 0.05-0.1
**Priority**: When both `absolute_tol` and `tol_ratio` are applicable, `absolute_tol` takes precedence for fitting error threshold.
## API Reference
### Encoding Methods
`encode_discrete(trajs, update_bounds=True)`
- Input: Trajectories tensor `[batch, in_seq_len, num_dof]`
- Output: Discrete tokens `[batch, out_seq_len * (num_dof + 1)]` in range `[0, vocab_size-1]`
- `update_bounds`: Whether to update internal weight bounds from this batch
`encode_continuous(trajs, update_bounds=True)`
- Input: Trajectories tensor `[batch, in_seq_len, num_dof]`
- Output: Normalized parameters `[batch, out_seq_len * (num_dof + 1)]` in range `[-1, 1]`
### Decoding Methods
`decode_discrete(tokens, target_length=None)`
- Input: Discrete tokens `[batch, out_seq_len * (num_dof + 1)]`
- Output: Reconstructed trajectories `[batch, target_length, num_dof]`
- `target_length`: Output trajectory length (optional, defaults to `in_seq_len`)
`decode_continuous(params, target_length=None)`
- Input: Normalized parameters `[batch, out_seq_len * (num_dof + 1)]`
- Output: Reconstructed trajectories `[batch, target_length, num_dof]`
### Utility Methods
`update_weights_bounds_per_batch(batch_weights)`
- Update the min/max bounds used for normalization based on new batch data
## Key Features
### Adaptive Knot Selection with MILP
Unlike BEAST's uniform knot spacing, BEST uses Mixed-Integer Linear Programming (MILP) to optimize knot placement:
- **Gripper-Driven Knots**: Automatically places knots at gripper state transitions
- **Curvature-Based Optimization**: Adds knots where trajectory curvature is high
- **Tolerance Control**: Balances compression ratio vs. reconstruction accuracy
### Forced Gripper Knots
Preserves discrete gripper states by:
- Detecting gripper state changes in input trajectory
- Forcing B-spline knots at transition points
- Using degree-0 splines for gripper DOF (piecewise constant)
### Performance Benchmarks (LIBERO Dataset, 100 samples)
| Action Chunk Size | Avg Time (s) | CP Min | CP Mean | CP Max | W_min | W_max | Success Rate |
|-------------------|--------------|--------|---------|--------|-------|-------|--------------|
| 5 steps | 0.104 | 5 | 5.0 | 5 | -1.0000 | 5.0000 | 100% |
| 10 steps | 0.211 | 10 | 10.0 | 10 | -1.0000 | 10.0000 | 100% |
| 15 steps | 0.427 | 15 | 15.0 | 15 | -1.3730 | 15.0000 | 100% |
| 20 steps | 0.696 | 20 | 20.0 | 20 | -1.0000 | 20.0000 | 100% |
| 25 steps | 1.904 | 25 | 25.0 | 25 | -1.0000 | 25.0000 | 100% |
| 30 steps | 3.217 | 30 | 30.0 | 30 | -1.0000 | 30.0000 | 100% |
| 35 steps | 5.372 | 35 | 35.0 | 35 | -1.0000 | 35.0000 | 93% |
Note: CP (Control Points) length represents the number of knots selected by the adaptive algorithm.
## Comparison with BEAST
| Feature | BEAST | BEST |
|---------|-------|------|
| Knot Selection | Uniform spacing | Adaptive (MILP-based) |
| Gripper Handling | Optional zero-order | Forced knots at transitions |
| Compression | Fixed basis count | Adaptive based on complexity |
| Encoding Time | ~20ms (50 steps) | ~100ms (5 steps) to ~5s (35 steps) |
| Trajectory Fidelity | High (uniform) | Very high (adaptive) |
| Use Case | General trajectories | Robot manipulation with gripper |
## Uses
### Intended Use Cases
- **Robot Imitation Learning**: Compress continuous demonstration trajectories with gripper states into discrete tokens for VLA-based policy learning
- **Manipulation Dataset Compression**: Reduce memory footprint while preserving both motion quality and discrete gripper transitions
- **VLA Action Tokenization**: Enable vision-language-action models to process robot actions as discrete token sequences with explicit gripper control
### Out-of-Scope Use Cases
- Trajectories without discrete state transitions (use BEAST instead for better speed)
- Real-time control (MILP optimization adds computational overhead)
- Non-robotic continuous signals (optimized for manipulation trajectories)
## Advanced Usage
### Custom Configuration
```python
from online_bspline_tokenizer import BestTokenizer
import torch
# Create with custom parameters
tokenizer = BestTokenizer(
num_dof=7,
in_seq_len=100, # Longer input trajectories
out_seq_len=100, # Allow up to 100 control points
vocab_size=512, # Higher resolution quantization
degree=3,
gripper_dof=1,
do_pad=True,
device='cuda'
)
# Process trajectories
trajectories = torch.randn(5, 100, 7)
tokens = tokenizer.encode_discrete(trajectories)
```
### Saving and Loading
```python
# Save processor configuration
tokenizer.save_pretrained("./my_best_tokenizer")
# Load later
from transformers import AutoProcessor
loaded_tokenizer = AutoProcessor.from_pretrained(
"./my_best_tokenizer",
trust_remote_code=True
)
```
## Citation
If you use BEST in your research, please cite:
```bibtex
@misc{best2026,
title={BEST: B-Spline Encoded Sequence Tokenizer with Adaptive Knots},
author={Hexinyu},
year={2026},
url={https://github.com/your-repo/best}
}
```
Based on BEAST:
```bibtex
@inproceedings{
zhou2025beast,
title={{BEAST}: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
author={Hongyi Zhou and Weiran Liao and Xi Huang and Yucheng Tang and Fabian Otto and Xiaogang Jia and Xinkai Jiang and Simon Hilber and Ge Li and Qian Wang and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Nils Blank and Moritz Reuss and Rudolf Lioutikov},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=rQCl1sf62w}
}
```
## License
MIT License
## Acknowledgments
This work builds upon [BEAST](https://huggingface.co/zhouhongyi/beast) and extends it with adaptive knot selection for improved manipulation trajectory encoding.
|