husj576 nielsr HF Staff commited on
Commit
cde614e
·
1 Parent(s): 29d7bc1

Add model card for GTO (#1)

Browse files

- Add model card for GTO (197173e787237f4c9f39679925804f748a2e80ed)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - speculative-decoding
6
+ - gto
7
+ ---
8
+
9
+ # Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
10
+
11
+ Group Tree Optimization (GTO) is a framework designed to address draft policy misalignment in speculative decoding. While standard methods optimize for a single greedy path, GTO aligns training with the actual tree-based decoding policy used during inference. This is achieved through a Draft Tree Reward objective and a stable Group-based Draft Policy Training scheme.
12
+
13
+ - **Paper:** [Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding](https://huggingface.co/papers/2509.22134)
14
+ - **Repository:** [https://github.com/hsj576/GTO](https://github.com/hsj576/GTO)
15
+
16
+ ## Performance
17
+ GTO achieves state-of-the-art acceleration for LLM inference:
18
+ - **5.6x faster** than vanilla autoregressive decoding.
19
+ - **7% faster** than previous state-of-the-art methods like EAGLE-3.
20
+
21
+ ## Usage
22
+
23
+ To use this model for accelerated inference, please follow the setup instructions in the [official GTO repository](https://github.com/hsj576/GTO).
24
+
25
+ ### Inference via Web UI
26
+ The codebase provides a web interface for testing the acceleration. After setting up the environment and cloning the repo, you can run:
27
+
28
+ ```bash
29
+ python -m application.webui --ea-model-path [path of GTO weight] \
30
+ --base-model-path [path of the original model] \
31
+ --model-type [vicuna\llama3\qwen] \
32
+ --total-token [int]
33
+ ```
34
+
35
+ The `total-token` parameter represents the number of draft tokens. Adjusting this based on your specific device and model can achieve better results.
36
+
37
+ ## Citation
38
+ If you find this work useful, please cite:
39
+ ```bibtex
40
+ @article{hu2025bridging,
41
+ title={Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding},
42
+ author={Hu, Shijing and Li, Jingyang and Lu, Zhihui and Zhou, Pan},
43
+ journal={arXiv preprint arXiv:2509.22134},
44
+ year={2025}
45
+ }
46
+ ```
47
+
48
+ ## Acknowledgements
49
+ The implementation is based on the open-source repository of [EAGLE](https://github.com/SafeAILab/EAGLE/tree/main). This project has been influenced by many projects in the LLM community, such as [HASS](https://github.com/HArmonizedSS/HASS) and [GRIFFIN](https://github.com/hsj576/GRIFFIN).