|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: robotics |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Mixture of Horizons in Action Chunking |
|
|
|
|
|
This repository hosts the official models and code for the paper: |
|
|
[**Mixture of Horizons in Action Chunking**](https://huggingface.co/papers/2511.19433) |
|
|
|
|
|
Project Page: https://timsty1.github.io/moh/ |
|
|
Code Repository: https://github.com/Timsty1/MixtureOfHorizons/tree/main |
|
|
|
|
|
## Introduction |
|
|
Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the **action chunk length** used during training, termed **horizon**. This paper proposes a **mixture of horizons (MoH)** strategy to mitigate the inherent trade-off between long-term foresight and short-term precision observed with fixed horizons. MoH rearranges action chunks into segments with different horizons, processes them in parallel with a shared action transformer, and fuses outputs. This approach allows MoH to exploit both long-term foresight and short-term precision jointly within a single model, improving performance and generalizability with minimal overhead. MoH also enables dynamic inference with adaptive horizons, achieving higher throughput while preserving superior performance. |
|
|
|
|
|
<div align="center"> |
|
|
<table border="0" cellspacing="0" cellpadding="0"> |
|
|
<tr> |
|
|
<td align="center" width="50%"> |
|
|
<img src="https://huggingface.co/Timsty/mixture_of_horizons/resolve/main/figure/study_of_horizons_pi0.png" alt="Trade-off Effect" width="100%"> |
|
|
</td> |
|
|
<td align="center" width="50%"> |
|
|
<img src="https://huggingface.co/Timsty/mixture_of_horizons/resolve/main/figure/intro_motivation_v2.png" alt="Mixture of Horizons" width="100%"> |
|
|
</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td align="center" valign="top"> |
|
|
Figure 1: Trade-off between long-term foresight and short-term precision induced by single horizon |
|
|
</td> |
|
|
<td align="center" valign="top"> |
|
|
Figure 2: Overview of the proposed mixture-of-horizons strategy |
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
</div> |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### 1. Environment Setup |
|
|
|
|
|
Clone the repository and set up the conda environment: |
|
|
|
|
|
```bash |
|
|
git clone git@github.com:Timsty1/MixtureOfHorizons.git |
|
|
conda create -n moh -y python=3.10 |
|
|
conda activate moh |
|
|
pip install uv |
|
|
cd MixtureOfHorizons |
|
|
uv pip install -r requirements.txt |
|
|
pip install packages/libero |
|
|
pip install packages/openpi-client |
|
|
``` |
|
|
|
|
|
### 2. Modify Transformers Library |
|
|
|
|
|
This implementation requires modifying the `transformers` library to support PyTorch-type $\pi$ series models, which rely on *gemma*, *paligemma*, and *siglip*. |
|
|
|
|
|
First, locate your conda environment path: |
|
|
```bash |
|
|
conda info --base |
|
|
``` |
|
|
Then, copy the provided files to the transformers library directory (replace `YOUR_CONDA_DIR` with the path found above): |
|
|
```bash |
|
|
cp -r ./src/openpi/models_pytorch/transformers_replace/* YOUR_CONDA_DIR/envs/moh/lib/python3.10/site-packages/transformers/ |
|
|
``` |
|
|
|
|
|
### 3. Inference with Code |
|
|
You can use our provided "eagenerate" for speedup generation just like using 'generate' from Hugging Face. Here is an example. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from eagle.model.ea_model import EaModel |
|
|
from fastchat.model import get_conversation_template |
|
|
|
|
|
# Replace with paths to your base model and EAGLE model checkpoints |
|
|
# Example: base_model_path = "lmsys/vicuna-13b-v1.3", EAGLE_model_path = "Timsty/mixture_of_horizons" |
|
|
base_model_path = "path/to/your/base_model" |
|
|
EAGLE_model_path = "path/to/your/eagle_model" |
|
|
|
|
|
model = EaModel.from_pretrained( |
|
|
base_model_path=base_model_path, |
|
|
ea_model_path=EAGLE_model_path, |
|
|
torch_dtype=torch.float16, |
|
|
low_cpu_mem_usage=True, |
|
|
device_map="auto", |
|
|
total_token=-1 |
|
|
) |
|
|
model.eval() |
|
|
your_message="Hello" |
|
|
conv = get_conversation_template("vicuna") # Use the correct template for your base model |
|
|
conv.append_message(conv.roles[0], your_message) |
|
|
conv.append_message(conv.roles[1], None) |
|
|
prompt = conv.get_prompt() |
|
|
input_ids=model.tokenizer([prompt]).input_ids |
|
|
input_ids = torch.as_tensor(input_ids).cuda() |
|
|
output_ids=model.eagenerate(input_ids,temperature=0.5,max_new_tokens=512) |
|
|
output=model.tokenizer.decode(output_ids[0]) |
|
|
print(output) |
|
|
``` |
|
|
**Note:** Vicuna, LLaMA2-Chat, and LLaMA3-Instruct are both chat models. You need to use the correct chat template, otherwise it will cause abnormal output from the model and affect the performance of EAGLE. |
|
|
|
|
|
## ❤️ Acknowledgment |
|
|
|
|
|
We express our gratitude to [OpenPi](https://github.com/Physical-Intelligence/openpi/tree/main), [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO), and [RoboTwin](https://robotwin-platform.github.io/) for their open-source contributions. |
|
|
|
|
|
## 📝 Citation |
|
|
If you feel that this paper, models, or codes are helpful, please cite our paper, thanks for your support! |
|
|
|
|
|
```bibtex |
|
|
@article{jing2025mixture_of_horizons, |
|
|
title={Mixture of Horizons in Action Chunking}, |
|
|
author={Jing, Dong and Wang, Gang and Liu, Jiaqi and Tang, Weiliang and Sun, Zelong and Yao, Yunchao and Wei, Zhenyu and Liu, Yunhui and Lu, Zhiwu and Ding, Mingyu}, |
|
|
journal={arXiv preprint arXiv:2511.19433}, |
|
|
year={2025} |
|
|
} |
|
|
``` |