Add model card for GTO draft model (#1)

- Add model card for GTO draft model (e4173e2681bb750b3766b20d59c9360069b4150a)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +60 -3

README.md CHANGED Viewed

@@ -1,3 +1,60 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- speculative-decoding
+- gto
+---
+# GTO: Group Tree Optimization for Speculative Decoding
+This repository contains a draft model for speculative decoding trained using **Group Tree Optimization (GTO)**.
+GTO is a framework designed to bridge the "draft policy misalignment" between training (which often focuses on single-token greedy paths) and inference (which uses tree-based re-ranking and verification). It introduces a **Draft Tree Reward** objective and a **Group-based Draft Policy Training** scheme to optimize acceptance lengths and inference speed.
+## Paper
+[Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding](https://arxiv.org/abs/2509.22134)
+## GitHub Repository
+For implementation details, training scripts, and inference code, please visit the official repository:
+[https://github.com/hsj576/GTO](https://github.com/hsj576/GTO)
+## Overview
+GTO achieves significant performance improvements:
+- **5.6x** faster than vanilla autoregressive decoding.
+- **7%** faster than prior state-of-the-art EAGLE-3.
+- Improves acceptance length by aligning training with the decoding-time tree policy.
+## Inference
+The official implementation provides a web interface for inference. To use this draft model with a base model, you can run the following command from the GTO repository:
+```bash
+python -m application.webui --ea-model-path [path of GTO weight]\
+		--base-model-path [path of the original model]\
+		--model-type [vicuna\llama3\qwen]\
+        --total-token [int]
+```
+The `total-token` parameter specifies the number of draft tokens. Adjust this value based on your specific hardware and model size for optimal results.
+## Citation
+If you find GTO useful in your research, please cite the following paper:
+```bibtex
+@article{hu2025bridging,
+  title={Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding},
+  author={Hu, Shijing and Li, Jingyang and Lu, Zhihui and Zhou, Pan},
+  journal={arXiv preprint arXiv:2509.22134},
+  year={2025}
+}
+```
+## Acknowledgements
+The implementation is based on the open-source repository of [EAGLE](https://github.com/SafeAILab/EAGLE/tree/main) and has been influenced by projects in the LLM community such as [HASS](https://github.com/HArmonizedSS/HASS) and [GRIFFIN](https://github.com/hsj576/GRIFFIN).