| | --- |
| | license: apache-2.0 |
| | pipeline_tag: robotics |
| | library_name: transformers |
| | --- |
| | |
| | # Mixture of Horizons in Action Chunking |
| |
|
| | This repository hosts the official models and code for the paper: |
| | [**Mixture of Horizons in Action Chunking**](https://huggingface.co/papers/2511.19433) |
| |
|
| | Project Page: https://timsty1.github.io/moh/ |
| | Code Repository: https://github.com/Timsty1/MixtureOfHorizons/tree/main |
| |
|
| | ## Introduction |
| | Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the **action chunk length** used during training, termed **horizon**. This paper proposes a **mixture of horizons (MoH)** strategy to mitigate the inherent trade-off between long-term foresight and short-term precision observed with fixed horizons. MoH rearranges action chunks into segments with different horizons, processes them in parallel with a shared action transformer, and fuses outputs. This approach allows MoH to exploit both long-term foresight and short-term precision jointly within a single model, improving performance and generalizability with minimal overhead. MoH also enables dynamic inference with adaptive horizons, achieving higher throughput while preserving superior performance. |
| |
|
| | <div align="center"> |
| | <table border="0" cellspacing="0" cellpadding="0"> |
| | <tr> |
| | <td align="center" width="50%"> |
| | <img src="https://huggingface.co/Timsty/mixture_of_horizons/resolve/main/figure/study_of_horizons_pi0.png" alt="Trade-off Effect" width="100%"> |
| | </td> |
| | <td align="center" width="50%"> |
| | <img src="https://huggingface.co/Timsty/mixture_of_horizons/resolve/main/figure/intro_motivation_v2.png" alt="Mixture of Horizons" width="100%"> |
| | </td> |
| | </tr> |
| | <tr> |
| | <td align="center" valign="top"> |
| | Figure 1: Trade-off between long-term foresight and short-term precision induced by single horizon |
| | </td> |
| | <td align="center" valign="top"> |
| | Figure 2: Overview of the proposed mixture-of-horizons strategy |
| | </td> |
| | </tr> |
| | </table> |
| | </div> |
| | |
| | ## Quick Start |
| |
|
| | ### 1. Environment Setup |
| |
|
| | Clone the repository and set up the conda environment: |
| |
|
| | ```bash |
| | git clone git@github.com:Timsty1/MixtureOfHorizons.git |
| | conda create -n moh -y python=3.10 |
| | conda activate moh |
| | pip install uv |
| | cd MixtureOfHorizons |
| | uv pip install -r requirements.txt |
| | pip install packages/libero |
| | pip install packages/openpi-client |
| | ``` |
| |
|
| | ### 2. Modify Transformers Library |
| |
|
| | This implementation requires modifying the `transformers` library to support PyTorch-type $\pi$ series models, which rely on *gemma*, *paligemma*, and *siglip*. |
| |
|
| | First, locate your conda environment path: |
| | ```bash |
| | conda info --base |
| | ``` |
| | Then, copy the provided files to the transformers library directory (replace `YOUR_CONDA_DIR` with the path found above): |
| | ```bash |
| | cp -r ./src/openpi/models_pytorch/transformers_replace/* YOUR_CONDA_DIR/envs/moh/lib/python3.10/site-packages/transformers/ |
| | ``` |
| |
|
| | ### 3. Inference with Code |
| | You can use our provided "eagenerate" for speedup generation just like using 'generate' from Hugging Face. Here is an example. |
| |
|
| | ```python |
| | import torch |
| | from eagle.model.ea_model import EaModel |
| | from fastchat.model import get_conversation_template |
| | |
| | # Replace with paths to your base model and EAGLE model checkpoints |
| | # Example: base_model_path = "lmsys/vicuna-13b-v1.3", EAGLE_model_path = "Timsty/mixture_of_horizons" |
| | base_model_path = "path/to/your/base_model" |
| | EAGLE_model_path = "path/to/your/eagle_model" |
| | |
| | model = EaModel.from_pretrained( |
| | base_model_path=base_model_path, |
| | ea_model_path=EAGLE_model_path, |
| | torch_dtype=torch.float16, |
| | low_cpu_mem_usage=True, |
| | device_map="auto", |
| | total_token=-1 |
| | ) |
| | model.eval() |
| | your_message="Hello" |
| | conv = get_conversation_template("vicuna") # Use the correct template for your base model |
| | conv.append_message(conv.roles[0], your_message) |
| | conv.append_message(conv.roles[1], None) |
| | prompt = conv.get_prompt() |
| | input_ids=model.tokenizer([prompt]).input_ids |
| | input_ids = torch.as_tensor(input_ids).cuda() |
| | output_ids=model.eagenerate(input_ids,temperature=0.5,max_new_tokens=512) |
| | output=model.tokenizer.decode(output_ids[0]) |
| | print(output) |
| | ``` |
| | **Note:** Vicuna, LLaMA2-Chat, and LLaMA3-Instruct are both chat models. You need to use the correct chat template, otherwise it will cause abnormal output from the model and affect the performance of EAGLE. |
| |
|
| | ## ❤️ Acknowledgment |
| |
|
| | We express our gratitude to [OpenPi](https://github.com/Physical-Intelligence/openpi/tree/main), [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO), and [RoboTwin](https://robotwin-platform.github.io/) for their open-source contributions. |
| |
|
| | ## 📝 Citation |
| | If you feel that this paper, models, or codes are helpful, please cite our paper, thanks for your support! |
| |
|
| | ```bibtex |
| | @article{jing2025mixture_of_horizons, |
| | title={Mixture of Horizons in Action Chunking}, |
| | author={Jing, Dong and Wang, Gang and Liu, Jiaqi and Tang, Weiliang and Sun, Zelong and Yao, Yunchao and Wei, Zhenyu and Liu, Yunhui and Lu, Zhiwu and Ding, Mingyu}, |
| | journal={arXiv preprint arXiv:2511.19433}, |
| | year={2025} |
| | } |
| | ``` |