Add pipeline tag and link to paper (#1)

Browse files

- Add pipeline tag and link to paper (dac7bbec412591f6a683f5bd9e0c71c7fa5bf7db)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +21 -9

README.md CHANGED Viewed

@@ -1,24 +1,27 @@
 ---
-license: apache-2.0
-datasets:
-- HuggingFaceH4/ultrachat_200k
 base_model:
 - Qwen/Qwen2.5-7B-Instruct
 ---
 # Qwen2.5-7B-Instruct_EAGLE3_UltraChat
 ### Introduction
 **Qwen2.5-7B-Instruct_EAGLE3_UltraChat** is trained based on the open-source Qwen2.5-32B-Instruct model using the [SpecForge](https://github.com/sgl-project/SpecForge) framework,
 and can be used for the Eagle-3 speculative decoding algorithm to speed up the inference of large language models during the decoding stage.
 ### Training Configuration
 We adopted the default training hyperparameters in SpecForge and trained EAGLE-3 to match the target model's output until convergence.
 This model checkpoint is obtained after five epochs of training ($\sim$260k training steps with bs=4). We find that even though further training improves training-time accuracy, they have a negligible impact on the end-to-end speedup of EAGLE-3.
-- Dataset: Utilized the UltraChat-200K dataset.
-- Training environment: The training was conducted on 4 NVIDIA H100 GPUs with 80 GB VRAM each, leveraging the DeepSpeed framework. Each training epoch took approximately 3.5 hours.
 ### Model Inference Launch Command
@@ -48,10 +51,19 @@ We run our evaluations on two NVIDIA A6000-48GB GPUs connected via PCIe 4.0 x16.
 | **Qwen2.5-7B-Instruct**  | 2.19x | 2.05x | 2.02x | 1.78x | 2.25x | **2.06x** |
-### Relevant Link
-Qwen2.5-7B-Instruct Open-source Weights: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
-"Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs" [arXiv '25]: https://arxiv.org/pdf/2512.20573
-Artifact of FailFast: https://github.com/ruipeterpan/failfast

 ---
 base_model:
 - Qwen/Qwen2.5-7B-Instruct
+datasets:
+- HuggingFaceH4/ultrachat_200k
+license: apache-2.0
+pipeline_tag: text-generation
 ---
 # Qwen2.5-7B-Instruct_EAGLE3_UltraChat
 ### Introduction
 **Qwen2.5-7B-Instruct_EAGLE3_UltraChat** is trained based on the open-source Qwen2.5-32B-Instruct model using the [SpecForge](https://github.com/sgl-project/SpecForge) framework,
 and can be used for the Eagle-3 speculative decoding algorithm to speed up the inference of large language models during the decoding stage.
+This model is an artifact for the paper: [Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs](https://huggingface.co/papers/2512.20573).
 ### Training Configuration
 We adopted the default training hyperparameters in SpecForge and trained EAGLE-3 to match the target model's output until convergence.
 This model checkpoint is obtained after five epochs of training ($\sim$260k training steps with bs=4). We find that even though further training improves training-time accuracy, they have a negligible impact on the end-to-end speedup of EAGLE-3.
+- **Dataset**: Utilized the [UltraChat-200K](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset.
+- **Training environment**: The training was conducted on 4 NVIDIA H100 GPUs with 80 GB VRAM each, leveraging the DeepSpeed framework. Each training epoch took approximately 3.5 hours.
 ### Model Inference Launch Command
 | **Qwen2.5-7B-Instruct**  | 2.19x | 2.05x | 2.02x | 1.78x | 2.25x | **2.06x** |
+### Relevant Links
+- **Paper**: [Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs](https://huggingface.co/papers/2512.20573)
+- **GitHub Repository**: [ruipeterpan/failfast](https://github.com/ruipeterpan/failfast)
+- **Base Model**: [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
+### Citation
+```bibtex
+@article{pan2025failfast,
+  title={Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs},
+  author={Pan, Rui and Chen, Zhuofu and Liu, Hongyi and Krishnamurthy, Arvind and Netravali, Ravi},
+  journal={arXiv preprint arXiv:2512.20573},
+  year={2025}
+}
+```