Add pipeline tag, library name and link paper (#1)
Browse files- Add pipeline tag, library name and link paper (ca0363607f6c0e095b051de788d93b54a33a044a)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,14 +1,21 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
- Qwen/Qwen2.5-14B-Instruct
|
| 5 |
datasets:
|
| 6 |
- HuggingFaceH4/ultrachat_200k
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
|
|
|
| 8 |
# Qwen2.5-14B-Instruct_EAGLE3_UltraChat
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
### Introduction
|
| 11 |
-
**Qwen2.5-14B-Instruct_EAGLE3_UltraChat** is trained based on the open-source Qwen2.5-
|
| 12 |
and can be used for the Eagle-3 speculative decoding algorithm to speed up the inference of large language models during the decoding stage.
|
| 13 |
|
| 14 |
|
|
@@ -48,10 +55,8 @@ We run our evaluations on two NVIDIA A6000-48GB GPUs connected via PCIe 4.0 x16.
|
|
| 48 |
| **Qwen2.5-7B-Instruct** | 2.19x | 2.05x | 2.02x | 1.78x | 2.25x | **2.06x** |
|
| 49 |
|
| 50 |
|
| 51 |
-
### Relevant
|
| 52 |
-
|
| 53 |
-
Qwen2.5-14B-Instruct Open-source Weights: https://huggingface.co/Qwen/Qwen2.5-14B-Instruct
|
| 54 |
-
|
| 55 |
-
"Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs" [arXiv '25]: https://arxiv.org/pdf/2512.20573
|
| 56 |
|
| 57 |
-
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen2.5-14B-Instruct
|
| 4 |
datasets:
|
| 5 |
- HuggingFaceH4/ultrachat_200k
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
library_name: vllm
|
| 9 |
---
|
| 10 |
+
|
| 11 |
# Qwen2.5-14B-Instruct_EAGLE3_UltraChat
|
| 12 |
|
| 13 |
+
This repository contains the EAGLE-3 draft model presented in the paper [Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs](https://huggingface.co/papers/2512.20573).
|
| 14 |
+
|
| 15 |
+
Code: [GitHub - FailFast](https://github.com/ruipeterpan/failfast)
|
| 16 |
+
|
| 17 |
### Introduction
|
| 18 |
+
**Qwen2.5-14B-Instruct_EAGLE3_UltraChat** is trained based on the open-source Qwen2.5-14B-Instruct model using the [SpecForge](https://github.com/sgl-project/SpecForge) framework,
|
| 19 |
and can be used for the Eagle-3 speculative decoding algorithm to speed up the inference of large language models during the decoding stage.
|
| 20 |
|
| 21 |
|
|
|
|
| 55 |
| **Qwen2.5-7B-Instruct** | 2.19x | 2.05x | 2.02x | 1.78x | 2.25x | **2.06x** |
|
| 56 |
|
| 57 |
|
| 58 |
+
### Relevant Links
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
+
- Qwen2.5-14B-Instruct Open-source Weights: https://huggingface.co/Qwen/Qwen2.5-14B-Instruct
|
| 61 |
+
- "Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs" [arXiv '25]: https://arxiv.org/pdf/2512.20573
|
| 62 |
+
- Artifact of FailFast: https://github.com/ruipeterpan/failfast
|