Add model card for GTO draft model (#1)
Browse files- Add model card for GTO draft model (e4173e2681bb750b3766b20d59c9360069b4150a)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,3 +1,60 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
tags:
|
| 6 |
+
- speculative-decoding
|
| 7 |
+
- gto
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# GTO: Group Tree Optimization for Speculative Decoding
|
| 11 |
+
|
| 12 |
+
This repository contains a draft model for speculative decoding trained using **Group Tree Optimization (GTO)**.
|
| 13 |
+
|
| 14 |
+
GTO is a framework designed to bridge the "draft policy misalignment" between training (which often focuses on single-token greedy paths) and inference (which uses tree-based re-ranking and verification). It introduces a **Draft Tree Reward** objective and a **Group-based Draft Policy Training** scheme to optimize acceptance lengths and inference speed.
|
| 15 |
+
|
| 16 |
+
## Paper
|
| 17 |
+
|
| 18 |
+
[Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding](https://arxiv.org/abs/2509.22134)
|
| 19 |
+
|
| 20 |
+
## GitHub Repository
|
| 21 |
+
|
| 22 |
+
For implementation details, training scripts, and inference code, please visit the official repository:
|
| 23 |
+
[https://github.com/hsj576/GTO](https://github.com/hsj576/GTO)
|
| 24 |
+
|
| 25 |
+
## Overview
|
| 26 |
+
|
| 27 |
+
GTO achieves significant performance improvements:
|
| 28 |
+
- **5.6x** faster than vanilla autoregressive decoding.
|
| 29 |
+
- **7%** faster than prior state-of-the-art EAGLE-3.
|
| 30 |
+
- Improves acceptance length by aligning training with the decoding-time tree policy.
|
| 31 |
+
|
| 32 |
+
## Inference
|
| 33 |
+
|
| 34 |
+
The official implementation provides a web interface for inference. To use this draft model with a base model, you can run the following command from the GTO repository:
|
| 35 |
+
|
| 36 |
+
```bash
|
| 37 |
+
python -m application.webui --ea-model-path [path of GTO weight]\
|
| 38 |
+
--base-model-path [path of the original model]\
|
| 39 |
+
--model-type [vicuna\llama3\qwen]\
|
| 40 |
+
--total-token [int]
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
The `total-token` parameter specifies the number of draft tokens. Adjust this value based on your specific hardware and model size for optimal results.
|
| 44 |
+
|
| 45 |
+
## Citation
|
| 46 |
+
|
| 47 |
+
If you find GTO useful in your research, please cite the following paper:
|
| 48 |
+
|
| 49 |
+
```bibtex
|
| 50 |
+
@article{hu2025bridging,
|
| 51 |
+
title={Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding},
|
| 52 |
+
author={Hu, Shijing and Li, Jingyang and Lu, Zhihui and Zhou, Pan},
|
| 53 |
+
journal={arXiv preprint arXiv:2509.22134},
|
| 54 |
+
year={2025}
|
| 55 |
+
}
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
## Acknowledgements
|
| 59 |
+
|
| 60 |
+
The implementation is based on the open-source repository of [EAGLE](https://github.com/SafeAILab/EAGLE/tree/main) and has been influenced by projects in the LLM community such as [HASS](https://github.com/HArmonizedSS/HASS) and [GRIFFIN](https://github.com/hsj576/GRIFFIN).
|