Add comprehensive model card for Mixture of Horizons

eccb524 verified about 2 months ago

5.1 kB

	---
	license: apache-2.0
	pipeline_tag: robotics
	library_name: transformers
	---

	# Mixture of Horizons in Action Chunking

	This repository hosts the official models and code for the paper:
	[Mixture of Horizons in Action Chunking](https://huggingface.co/papers/2511.19433)

	Project Page: https://timsty1.github.io/moh/
	Code Repository: https://github.com/Timsty1/MixtureOfHorizons/tree/main

	## Introduction
	Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the action chunk length used during training, termed horizon. This paper proposes a mixture of horizons (MoH) strategy to mitigate the inherent trade-off between long-term foresight and short-term precision observed with fixed horizons. MoH rearranges action chunks into segments with different horizons, processes them in parallel with a shared action transformer, and fuses outputs. This approach allows MoH to exploit both long-term foresight and short-term precision jointly within a single model, improving performance and generalizability with minimal overhead. MoH also enables dynamic inference with adaptive horizons, achieving higher throughput while preserving superior performance.

	<div align="center">
	<table border="0" cellspacing="0" cellpadding="0">
	<tr>
	<td align="center" width="50%">
	<img src="https://huggingface.co/Timsty/mixture_of_horizons/resolve/main/figure/study_of_horizons_pi0.png" alt="Trade-off Effect" width="100%">
	</td>
	<td align="center" width="50%">
	<img src="https://huggingface.co/Timsty/mixture_of_horizons/resolve/main/figure/intro_motivation_v2.png" alt="Mixture of Horizons" width="100%">
	</td>
	</tr>
	<tr>
	<td align="center" valign="top">
	Figure 1: Trade-off between long-term foresight and short-term precision induced by single horizon
	</td>
	<td align="center" valign="top">
	Figure 2: Overview of the proposed mixture-of-horizons strategy
	</td>
	</tr>
	</table>
	</div>

	## Quick Start

	### 1. Environment Setup

	Clone the repository and set up the conda environment:

	```bash
	git clone git@github.com:Timsty1/MixtureOfHorizons.git
	conda create -n moh -y python=3.10
	conda activate moh
	pip install uv
	cd MixtureOfHorizons
	uv pip install -r requirements.txt
	pip install packages/libero
	pip install packages/openpi-client
	```

	### 2. Modify Transformers Library

	This implementation requires modifying the `transformers` library to support PyTorch-type $\pi$ series models, which rely on gemma, paligemma, and siglip.

	First, locate your conda environment path:
	```bash
	conda info --base
	```
	Then, copy the provided files to the transformers library directory (replace `YOUR_CONDA_DIR` with the path found above):
	```bash
	cp -r ./src/openpi/models_pytorch/transformers_replace/* YOUR_CONDA_DIR/envs/moh/lib/python3.10/site-packages/transformers/
	```

	### 3. Inference with Code
	You can use our provided "eagenerate" for speedup generation just like using 'generate' from Hugging Face. Here is an example.

	```python
	import torch
	from eagle.model.ea_model import EaModel
	from fastchat.model import get_conversation_template

	# Replace with paths to your base model and EAGLE model checkpoints
	# Example: base_model_path = "lmsys/vicuna-13b-v1.3", EAGLE_model_path = "Timsty/mixture_of_horizons"
	base_model_path = "path/to/your/base_model"
	EAGLE_model_path = "path/to/your/eagle_model"

	model = EaModel.from_pretrained(
	base_model_path=base_model_path,
	ea_model_path=EAGLE_model_path,
	torch_dtype=torch.float16,
	low_cpu_mem_usage=True,
	device_map="auto",
	total_token=-1
	)
	model.eval()
	your_message="Hello"
	conv = get_conversation_template("vicuna") # Use the correct template for your base model
	conv.append_message(conv.roles[0], your_message)
	conv.append_message(conv.roles[1], None)
	prompt = conv.get_prompt()
	input_ids=model.tokenizer([prompt]).input_ids
	input_ids = torch.as_tensor(input_ids).cuda()
	output_ids=model.eagenerate(input_ids,temperature=0.5,max_new_tokens=512)
	output=model.tokenizer.decode(output_ids[0])
	print(output)
	```
	Note: Vicuna, LLaMA2-Chat, and LLaMA3-Instruct are both chat models. You need to use the correct chat template, otherwise it will cause abnormal output from the model and affect the performance of EAGLE.

	## ❤️ Acknowledgment

	We express our gratitude to [OpenPi](https://github.com/Physical-Intelligence/openpi/tree/main), [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO), and [RoboTwin](https://robotwin-platform.github.io/) for their open-source contributions.

	## 📝 Citation
	If you feel that this paper, models, or codes are helpful, please cite our paper, thanks for your support!

	```bibtex
	@article{jing2025mixture_of_horizons,
	title={Mixture of Horizons in Action Chunking},
	author={Jing, Dong and Wang, Gang and Liu, Jiaqi and Tang, Weiliang and Sun, Zelong and Yao, Yunchao and Wei, Zhenyu and Liu, Yunhui and Lu, Zhiwu and Ding, Mingyu},
	journal={arXiv preprint arXiv:2511.19433},
	year={2025}
	}
	```