File size: 5,103 Bytes

eccb524

---
license: apache-2.0
pipeline_tag: robotics
library_name: transformers
---

# Mixture of Horizons in Action Chunking

This repository hosts the official models and code for the paper:
[**Mixture of Horizons in Action Chunking**](https://huggingface.co/papers/2511.19433)

Project Page: https://timsty1.github.io/moh/
Code Repository: https://github.com/Timsty1/MixtureOfHorizons/tree/main

## Introduction
Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the **action chunk length** used during training, termed **horizon**. This paper proposes a **mixture of horizons (MoH)** strategy to mitigate the inherent trade-off between long-term foresight and short-term precision observed with fixed horizons. MoH rearranges action chunks into segments with different horizons, processes them in parallel with a shared action transformer, and fuses outputs. This approach allows MoH to exploit both long-term foresight and short-term precision jointly within a single model, improving performance and generalizability with minimal overhead. MoH also enables dynamic inference with adaptive horizons, achieving higher throughput while preserving superior performance.

<div align="center">
  <table border="0" cellspacing="0" cellpadding="0">
    <tr>
      <td align="center" width="50%">
        <img src="https://huggingface.co/Timsty/mixture_of_horizons/resolve/main/figure/study_of_horizons_pi0.png" alt="Trade-off Effect" width="100%">
      </td>
      <td align="center" width="50%">
        <img src="https://huggingface.co/Timsty/mixture_of_horizons/resolve/main/figure/intro_motivation_v2.png" alt="Mixture of Horizons" width="100%">
      </td>
    </tr>
    <tr>
      <td align="center" valign="top">
        Figure 1: Trade-off between long-term foresight and short-term precision induced by single horizon
      </td>
      <td align="center" valign="top">
        Figure 2: Overview of the proposed mixture-of-horizons strategy
      </td>
    </tr>
  </table>
</div>

## Quick Start

### 1. Environment Setup

Clone the repository and set up the conda environment:

```bash
git clone git@github.com:Timsty1/MixtureOfHorizons.git
conda create -n moh -y python=3.10
conda activate moh
pip install uv
cd MixtureOfHorizons
uv pip install -r requirements.txt
pip install packages/libero
pip install packages/openpi-client
```

### 2. Modify Transformers Library

This implementation requires modifying the `transformers` library to support PyTorch-type $\pi$ series models, which rely on *gemma*, *paligemma*, and *siglip*.

First, locate your conda environment path:
```bash
conda info --base
```
Then, copy the provided files to the transformers library directory (replace `YOUR_CONDA_DIR` with the path found above):
```bash
cp -r ./src/openpi/models_pytorch/transformers_replace/* YOUR_CONDA_DIR/envs/moh/lib/python3.10/site-packages/transformers/
```

### 3. Inference with Code
You can use our provided "eagenerate" for speedup generation just like using 'generate' from Hugging Face. Here is an example.

```python
import torch
from eagle.model.ea_model import EaModel
from fastchat.model import get_conversation_template

# Replace with paths to your base model and EAGLE model checkpoints
# Example: base_model_path = "lmsys/vicuna-13b-v1.3", EAGLE_model_path = "Timsty/mixture_of_horizons"
base_model_path = "path/to/your/base_model"
EAGLE_model_path = "path/to/your/eagle_model"

model = EaModel.from_pretrained(
    base_model_path=base_model_path,
    ea_model_path=EAGLE_model_path,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    device_map="auto",
    total_token=-1
)
model.eval()
your_message="Hello"
conv = get_conversation_template("vicuna") # Use the correct template for your base model
conv.append_message(conv.roles[0], your_message)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids=model.tokenizer([prompt]).input_ids
input_ids = torch.as_tensor(input_ids).cuda()
output_ids=model.eagenerate(input_ids,temperature=0.5,max_new_tokens=512)
output=model.tokenizer.decode(output_ids[0])
print(output)
```
**Note:** Vicuna, LLaMA2-Chat, and LLaMA3-Instruct are both chat models. You need to use the correct chat template, otherwise it will cause abnormal output from the model and affect the performance of EAGLE.

## ❤️ Acknowledgment

We express our gratitude to [OpenPi](https://github.com/Physical-Intelligence/openpi/tree/main), [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO), and [RoboTwin](https://robotwin-platform.github.io/) for their open-source contributions.

## 📝 Citation
If you feel that this paper, models, or codes are helpful, please cite our paper, thanks for your support!

```bibtex
@article{jing2025mixture_of_horizons,
  title={Mixture of Horizons in Action Chunking},
  author={Jing, Dong and Wang, Gang and Liu, Jiaqi and Tang, Weiliang and Sun, Zelong and Yao, Yunchao and Wei, Zhenyu and Liu, Yunhui and Lu, Zhiwu and Ding, Mingyu},
  journal={arXiv preprint arXiv:2511.19433},
  year={2025}
}
```