GTO: Group Tree Optimization for Speculative Decoding
This repository contains the draft model weights for GTO (Group Tree Optimization), as introduced in the paper Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding.
GTO is a novel framework designed to address draft policy misalignment in speculative decoding. It aligns training with the decoding-time tree policy through two main components:
- Draft Tree Reward: A sampling-free objective equal to the expected acceptance length of the draft tree under the target model.
- Group-based Draft Policy Training: A stable optimization scheme that contrasts trees from the current and a frozen reference draft model.
Resources
- Paper: Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
- GitHub Repository: https://github.com/hsj576/GTO
Performance
GTO achieves significant speedups across dialogue (MT-Bench), code (HumanEval), and math (GSM8K) tasks:
- 5.6x faster than vanilla autoregressive decoding.
- 7.7% additional speedup over prior state-of-the-art methods like EAGLE-3.
Inference
To use these weights, you should use the inference code provided in the official repository. The implementation supports multi-GPU weight allocation.
You can use the suggested web interface by running:
python -m application.webui --ea-model-path [path of GTO weight] \
--base-model-path [path of the original model] \
--model-type [vicuna\llama3\qwen] \
--total-token [int]
The total-token parameter represents the number of draft tokens. Adjusting this value according to the specific device and model can achieve better results.
Citation
@article{hu2025bridging,
title={Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding},
author={Hu, Shijing and Li, Jingyang and Lu, Zhihui and Zhou, Pan},
journal={arXiv preprint arXiv:2509.22134},
year={2025}
}
Acknowledgements
This implementation is based on the open-source repository of EAGLE. This project has also been influenced by HASS, GRIFFIN, and other projects in the LLM community.
- Downloads last month
- 33