zhouhongyi commited on
Commit
790eb33
·
1 Parent(s): 8a0ae1e

update readme

Browse files
Files changed (1) hide show
  1. README.md +114 -144
README.md CHANGED
@@ -3,182 +3,152 @@ library_name: transformers
3
  tags: []
4
  ---
5
 
6
- # BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
7
 
8
- This is the official repo for BEAST tokenizer.
9
-
10
- BEAST is an action tokenizer that translate continous robot action sequences into discrete tokens leveraging B-Splines.
11
-
12
- <!-- Provide a quick summary of what the model is/does. -->
13
 
14
  ## Installation
15
 
16
- ## Quick Start
17
-
18
- ## Parameters
19
-
20
-
21
- ## Uses
22
-
23
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
24
-
25
- ### Direct Use
26
-
27
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
28
-
29
- [More Information Needed]
30
-
31
- ### Downstream Use [optional]
32
-
33
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
34
-
35
- [More Information Needed]
36
-
37
- ### Out-of-Scope Use
38
-
39
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
40
-
41
- [More Information Needed]
42
-
43
- ## Bias, Risks, and Limitations
44
-
45
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
46
-
47
- [More Information Needed]
48
-
49
- ### Recommendations
50
-
51
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
52
-
53
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
54
-
55
- ## How to Get Started with the Model
56
-
57
- Use the code below to get started with the model.
58
-
59
- [More Information Needed]
60
-
61
- ## Training Details
62
-
63
- ### Training Data
64
-
65
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
66
-
67
- [More Information Needed]
68
 
69
- ### Training Procedure
 
 
70
 
71
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
72
 
73
- #### Preprocessing [optional]
74
-
75
- [More Information Needed]
76
-
77
-
78
- #### Training Hyperparameters
79
-
80
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
81
-
82
- #### Speeds, Sizes, Times [optional]
83
-
84
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
85
-
86
- [More Information Needed]
87
-
88
- ## Evaluation
89
-
90
- <!-- This section describes the evaluation protocols and provides the results. -->
91
-
92
- ### Testing Data, Factors & Metrics
93
-
94
- #### Testing Data
95
-
96
- <!-- This should link to a Dataset Card if possible. -->
97
-
98
- [More Information Needed]
99
-
100
- #### Factors
101
-
102
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
103
-
104
- [More Information Needed]
105
-
106
- #### Metrics
107
-
108
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
109
 
110
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
 
112
- ### Results
113
 
114
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
115
 
116
- #### Summary
117
 
 
118
 
 
119
 
120
- ## Model Examination [optional]
121
 
122
- <!-- Relevant interpretability work for the model goes here -->
123
 
124
- [More Information Needed]
 
 
 
125
 
126
- ## Environmental Impact
 
 
127
 
128
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
129
 
130
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 
 
 
 
131
 
132
- - **Hardware Type:** [More Information Needed]
133
- - **Hours used:** [More Information Needed]
134
- - **Cloud Provider:** [More Information Needed]
135
- - **Compute Region:** [More Information Needed]
136
- - **Carbon Emitted:** [More Information Needed]
137
 
138
- ## Technical Specifications [optional]
139
 
140
- ### Model Architecture and Objective
 
141
 
142
- [More Information Needed]
 
143
 
144
- ### Compute Infrastructure
145
 
146
- [More Information Needed]
147
 
148
- #### Hardware
149
 
150
- [More Information Needed]
 
 
151
 
152
- #### Software
153
 
154
- [More Information Needed]
155
 
156
- ## Citation [optional]
157
 
158
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
159
 
160
  **BibTeX:**
161
 
162
- [More Information Needed]
163
-
164
- **APA:**
165
-
166
- [More Information Needed]
167
-
168
- ## Glossary [optional]
169
-
170
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
171
-
172
- [More Information Needed]
173
-
174
- ## More Information [optional]
175
-
176
- [More Information Needed]
177
-
178
- ## Model Card Authors [optional]
179
-
180
- [More Information Needed]
181
-
182
- ## Model Card Contact
183
 
184
- [More Information Needed]
 
3
  tags: []
4
  ---
5
 
6
+ # BEAST: B-Spline Encoded Action Sequences Tokenizer
7
 
8
+ BEAST is an action tokenizer that converts continuous robot action sequences into discrete tokens using B-splines. It enables efficient trajectory compression for imitation learning by representing smooth robot motions as compact token sequences.
 
 
 
 
9
 
10
  ## Installation
11
 
12
+ Install the required dependencies:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
+ ```bash
15
+ pip install torch numpy matplotlib einops transformers
16
+ ```
17
 
18
+ **Note:** CUDA is recommended for optimal performance, but CPU is also supported by setting `device="cpu"`.
19
 
20
+ ## Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
+ ```python
23
+ from transformers import AutoProcessor
24
+ import torch
25
+
26
+ # Initialize the BEAST processor with configuration parameters:
27
+ # - num_dof: degrees of freedom (3 for 3D trajectories like x, y, z)
28
+ # - num_basis: number of B-spline basis functions used for trajectory representation
29
+ # - seq_len: length of the trajectory sequence (number of time steps)
30
+ # - degree_p: degree of the B-spline polynomial (3 = cubic spline)
31
+ # - device: computation device ('cpu' or 'cuda')
32
+ beast = AutoProcessor.from_pretrained(
33
+ "zhouhongyi/beast",
34
+ trust_remote_code=True,
35
+ num_dof = 3,
36
+ num_basis = 20,
37
+ seq_len = 50,
38
+ degree_p = 3,
39
+ device = 'cpu'
40
+ )
41
+
42
+ # Create random trajectory data: 10 trajectories, each with 50 time steps, 3 dimensions
43
+ trajectories = torch.randn(10, 50, 3)
44
+
45
+ # Encode trajectories into discrete tokens
46
+ # update_bounds=True allows the processor to adaptively update quantization bounds
47
+ tokens = beast.encode_discrete(trajectories, update_bounds=True)
48
+ print(f"Encoded tokens shape: {tokens.shape}")
49
+
50
+ # Decode tokens back to continuous trajectories
51
+ reconstructed_trajectories = beast.decode_discrete(tokens)
52
+ print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}")
53
+
54
+ # Calculate mean squared error to measure reconstruction quality
55
+ mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2)
56
+ print(f"MSE Loss: {mse_loss.item()}")
57
+
58
+ # Visualize the reconstruction error for analysis
59
+ beast.visualize_reconstruction_error_discrete(trajectories)
60
+ ```
61
+
62
+ ### Continuous Encoding
63
+
64
+ For integration with continuous generative models:
65
+
66
+ ```python
67
+ # Encode to normalized continuous parameters [-1, 1]
68
+ params = tokenizer.encode_continuous(trajectories, update_bounds=True)
69
+
70
+ # Decode back
71
+ reconstructed = tokenizer.decode_continuous(params)
72
+ ```
73
 
74
+ ## Parameters
75
 
76
+ | Parameter | Description | Default |
77
+ |-----------|-------------|---------|
78
+ | `num_dof` | Total degrees of freedom (robot joints + gripper) | 7 |
79
+ | `num_basis` | Number of B-spline basis functions. Higher values improve reconstruction fidelity but produce more tokens | 10 |
80
+ | `seq_len` | Trajectory sequence length (number of timesteps) | 50 |
81
+ | `vocab_size` | Discrete vocabulary size (256 = 8-bit tokens) | 256 |
82
+ | `degree_p` | B-spline polynomial degree. Higher degrees produce smoother curves (3=cubic, 4=quartic) | 4 |
83
+ | `device` | Torch device (`"cuda"` or `"cpu"`) | `"cuda"` |
84
+ | `gripper_zero_order` | Use piecewise-constant (degree 0) splines for gripper DOFs. Useful for binary gripper states | `False` |
85
+ | `gripper_dof` | Number of gripper DOFs. Only used when `gripper_zero_order=True` | 1 |
86
+ | `init_cond_order` | Initial boundary condition order: 0=none, 1=position only, 2=position+velocity | 0 |
87
+ | `end_cond_order` | End boundary condition order (same options as `init_cond_order`) | 0 |
88
+ | `enforce_init_pos` | Enforce initial position constraint during decoding | `False` |
89
 
90
+ ### Token Count
91
 
92
+ The total number of tokens per trajectory is: `num_basis * num_dof`
93
 
94
+ For example, with default settings (10 basis, 7 DOF): 70 tokens per trajectory.
95
 
96
+ ## API Reference
97
 
98
+ ### Encoding Methods
99
 
100
+ **`encode_discrete(trajs, update_bounds=True)`**
101
+ - Input: Trajectories tensor `[batch, seq_len, num_dof]`
102
+ - Output: Discrete tokens `[batch, num_basis * num_dof]` in range `[0, vocab_size-1]`
103
+ - `update_bounds`: Whether to update internal weight bounds from this batch
104
 
105
+ **`encode_continuous(trajs, update_bounds=True)`**
106
+ - Input: Trajectories tensor `[batch, seq_len, num_dof]`
107
+ - Output: Normalized parameters `[batch, num_basis * num_dof]` in range `[-1, 1]`
108
 
109
+ ### Decoding Methods
110
 
111
+ **`decode_discrete(tokens, times=None, init_pos=None)`**
112
+ - Input: Discrete tokens `[batch, num_basis * num_dof]`
113
+ - Output: Reconstructed trajectories `[batch, seq_len, num_dof]`
114
+ - `times`: Custom time points (optional, defaults to `seq_len` uniform points)
115
+ - `init_pos`: Initial position constraint (optional)
116
 
117
+ **`decode_continuous(params, times=None, init_pos=None)`**
118
+ - Input: Normalized parameters `[batch, num_basis * num_dof]`
119
+ - Output: Reconstructed trajectories `[batch, seq_len, num_dof]`
 
 
120
 
121
+ ### Utility Methods
122
 
123
+ **`compute_reconstruction_error(raw_traj)`**
124
+ - Compute MSE between original and reconstructed trajectory
125
 
126
+ **`visualize_reconstruction_error_discrete(raw_traj)`** / **`visualize_reconstruction_error_continuous(raw_traj)`**
127
+ - Plot original vs reconstructed trajectories for visual comparison
128
 
 
129
 
130
+ ## Uses
131
 
132
+ ### Intended Use Cases
133
 
134
+ - **Robot Imitation Learning**: Compress continuous demonstration trajectories into discrete tokens for language model-based policy learning
135
+ - **Trajectory Compression**: Reduce memory footprint of robot demonstration datasets while preserving motion quality
136
+ - **Action Tokenization**: Enable transformer-based models to process robot actions as discrete token sequences
137
 
 
138
 
 
139
 
140
+ ## Citation
141
 
142
+ If you use BEAST in your research, please cite:
143
 
144
  **BibTeX:**
145
 
146
+ ```bibtex
147
+ @article{zhou2025beast,
148
+ title={BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
149
+ author={Zhou, Hongyi and Liao, Weiran and Huang, Xi and Tang, Yucheng and Otto, Fabian and Jia, Xiaogang and Jiang, Xinkai and Hilber, Simon and Li, Ge and Wang, Qian and others},
150
+ journal={arXiv preprint arXiv:2506.06072},
151
+ year={2025}
152
+ }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
 
154
+ ```