Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

GTO (Group Tree Optimization) is a framework designed to address draft policy misalignment in speculative decoding. This repository contains the draft model weights optimized for Vicuna-13b.

Overview

Speculative decoding accelerates large language model (LLM) inference by letting a lightweight draft model propose multiple tokens that the target model verifies in parallel. GTO aligns training with the decoding-time tree policy through two components:

  1. Draft Tree Reward: A sampling-free objective equal to the expected acceptance length of the draft tree under the target model.
  2. Group-based Draft Policy Training: A stable optimization scheme that contrasts trees from the current and a frozen reference draft model.

GTO achieves significant speedups—up to 5.6x faster than vanilla decoding and a 7.7% improvement over prior state-of-the-art methods like EAGLE-3.

Inference

The official GTO implementation provides a web interface for inference. It automatically handles weight allocation across multiple GPUs.

To run the model, follow the setup instructions in the official repository and use the following command:

python -m application.webui --ea-model-path [path of GTO weight] \ 
        --base-model-path [path of the original model]\
        --model-type vicuna \
        --total-token [int]

The total-token parameter defines the number of draft tokens. Adjusting this according to your hardware and the specific model can yield better performance.

Citation

If you find GTO useful in your research, please cite:

@article{hu2025bridging,
  title={Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding},
  author={Hu, Shijing and Li, Jingyang and Lu, Zhihui and Zhou, Pan},
  journal={arXiv preprint arXiv:2509.22134},
  year={2025}
}

Acknowledgements

The GTO implementation is based on the open-source repository of EAGLE and influenced by projects like HASS and GRIFFIN.

Downloads last month
61
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for husj576/GTO-vicuna-13b