zhouhongyi
/

beast

Transformers

Model card Files Files and versions

xet

Community

zhouhongyi commited on 19 days ago

Commit

790eb33

1 Parent(s): 8a0ae1e

update readme

Browse files

Files changed (1) hide show

README.md +114 -144

README.md CHANGED Viewed

@@ -3,182 +3,152 @@ library_name: transformers
 tags: []
 ---
-# BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
-This is the official repo for BEAST tokenizer.
-BEAST is an action tokenizer that translate continous robot action sequences into discrete tokens leveraging B-Splines.
-<!-- Provide a quick summary of what the model is/does. -->
 ## Installation
-## Quick Start
-## Parameters
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 **BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 tags: []
 ---
+# BEAST: B-Spline Encoded Action Sequences Tokenizer
+BEAST is an action tokenizer that converts continuous robot action sequences into discrete tokens using B-splines. It enables efficient trajectory compression for imitation learning by representing smooth robot motions as compact token sequences.
 ## Installation
+Install the required dependencies:
+```bash
+pip install torch numpy matplotlib einops transformers
+```
+**Note:** CUDA is recommended for optimal performance, but CPU is also supported by setting `device="cpu"`.
+## Quick Start
+```python
+from transformers import AutoProcessor
+import torch
+# Initialize the BEAST processor with configuration parameters:
+# - num_dof: degrees of freedom (3 for 3D trajectories like x, y, z)
+# - num_basis: number of B-spline basis functions used for trajectory representation
+# - seq_len: length of the trajectory sequence (number of time steps)
+# - degree_p: degree of the B-spline polynomial (3 = cubic spline)
+# - device: computation device ('cpu' or 'cuda')
+beast = AutoProcessor.from_pretrained(
+    "zhouhongyi/beast",
+    trust_remote_code=True,
+    num_dof = 3,
+    num_basis = 20,
+    seq_len = 50,
+    degree_p = 3,
+    device = 'cpu'
+)
+# Create random trajectory data: 10 trajectories, each with 50 time steps, 3 dimensions
+trajectories = torch.randn(10, 50, 3)
+# Encode trajectories into discrete tokens
+# update_bounds=True allows the processor to adaptively update quantization bounds
+tokens = beast.encode_discrete(trajectories, update_bounds=True)
+print(f"Encoded tokens shape: {tokens.shape}")
+# Decode tokens back to continuous trajectories
+reconstructed_trajectories = beast.decode_discrete(tokens)
+print(f"Reconstructed trajectories shape: {reconstructed_trajectories.shape}")
+# Calculate mean squared error to measure reconstruction quality
+mse_loss = torch.mean((trajectories - reconstructed_trajectories) ** 2)
+print(f"MSE Loss: {mse_loss.item()}")
+# Visualize the reconstruction error for analysis
+beast.visualize_reconstruction_error_discrete(trajectories)
+```
+### Continuous Encoding
+For integration with continuous generative models:
+```python
+# Encode to normalized continuous parameters [-1, 1]
+params = tokenizer.encode_continuous(trajectories, update_bounds=True)
+# Decode back
+reconstructed = tokenizer.decode_continuous(params)
+```
+## Parameters
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `num_dof` | Total degrees of freedom (robot joints + gripper) | 7 |
+| `num_basis` | Number of B-spline basis functions. Higher values improve reconstruction fidelity but produce more tokens | 10 |
+| `seq_len` | Trajectory sequence length (number of timesteps) | 50 |
+| `vocab_size` | Discrete vocabulary size (256 = 8-bit tokens) | 256 |
+| `degree_p` | B-spline polynomial degree. Higher degrees produce smoother curves (3=cubic, 4=quartic) | 4 |
+| `device` | Torch device (`"cuda"` or `"cpu"`) | `"cuda"` |
+| `gripper_zero_order` | Use piecewise-constant (degree 0) splines for gripper DOFs. Useful for binary gripper states | `False` |
+| `gripper_dof` | Number of gripper DOFs. Only used when `gripper_zero_order=True` | 1 |
+| `init_cond_order` | Initial boundary condition order: 0=none, 1=position only, 2=position+velocity | 0 |
+| `end_cond_order` | End boundary condition order (same options as `init_cond_order`) | 0 |
+| `enforce_init_pos` | Enforce initial position constraint during decoding | `False` |
+### Token Count
+The total number of tokens per trajectory is: `num_basis * num_dof`
+For example, with default settings (10 basis, 7 DOF): 70 tokens per trajectory.
+## API Reference
+### Encoding Methods
+**`encode_discrete(trajs, update_bounds=True)`**
+- Input: Trajectories tensor `[batch, seq_len, num_dof]`
+- Output: Discrete tokens `[batch, num_basis * num_dof]` in range `[0, vocab_size-1]`
+- `update_bounds`: Whether to update internal weight bounds from this batch
+**`encode_continuous(trajs, update_bounds=True)`**
+- Input: Trajectories tensor `[batch, seq_len, num_dof]`
+- Output: Normalized parameters `[batch, num_basis * num_dof]` in range `[-1, 1]`
+### Decoding Methods
+**`decode_discrete(tokens, times=None, init_pos=None)`**
+- Input: Discrete tokens `[batch, num_basis * num_dof]`
+- Output: Reconstructed trajectories `[batch, seq_len, num_dof]`
+- `times`: Custom time points (optional, defaults to `seq_len` uniform points)
+- `init_pos`: Initial position constraint (optional)
+**`decode_continuous(params, times=None, init_pos=None)`**
+- Input: Normalized parameters `[batch, num_basis * num_dof]`
+- Output: Reconstructed trajectories `[batch, seq_len, num_dof]`
+### Utility Methods
+**`compute_reconstruction_error(raw_traj)`**
+- Compute MSE between original and reconstructed trajectory
+**`visualize_reconstruction_error_discrete(raw_traj)`** / **`visualize_reconstruction_error_continuous(raw_traj)`**
+- Plot original vs reconstructed trajectories for visual comparison
+## Uses
+### Intended Use Cases
+- **Robot Imitation Learning**: Compress continuous demonstration trajectories into discrete tokens for language model-based policy learning
+- **Trajectory Compression**: Reduce memory footprint of robot demonstration datasets while preserving motion quality
+- **Action Tokenization**: Enable transformer-based models to process robot actions as discrete token sequences
+## Citation
+If you use BEAST in your research, please cite:
 **BibTeX:**
+```bibtex
+@article{zhou2025beast,
+  title={BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning},
+  author={Zhou, Hongyi and Liao, Weiran and Huang, Xi and Tang, Yucheng and Otto, Fabian and Jia, Xiaogang and Jiang, Xinkai and Hilber, Simon and Li, Ge and Wang, Qian and others},
+  journal={arXiv preprint arXiv:2506.06072},
+  year={2025}
+}
+```