Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
tags:
|
| 4 |
+
- speculative-decoding
|
| 5 |
+
- early-exit
|
| 6 |
+
- dynamic-inference
|
| 7 |
+
base_model: meta-llama/Meta-Llama-3-8B
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# DSSD Auxiliary Heads for Llama 3 8B
|
| 11 |
+
|
| 12 |
+
This repository contains the trained **auxiliary early-exit heads** for **meta-llama/Meta-Llama-3-8B**.
|
| 13 |
+
These heads are used for **Dynamic Self-Speculative Decoding (DSSD)**, enabling faster inference with guaranteed identical output to the full model.
|
| 14 |
+
|
| 15 |
+
## 🚀 Usage
|
| 16 |
+
|
| 17 |
+
These heads are designed to be used with the **DSSD Demo**.
|
| 18 |
+
|
| 19 |
+
- **GitHub Repository**: [FlorianVal/DSSD_demo](https://github.com/FlorianVal/DSSD_demo)
|
| 20 |
+
- **Live Demo**: [HuggingFace Space](https://huggingface.co/spaces/valcore/Dssd_Demo)
|
| 21 |
+
|
| 22 |
+
## 📦 Contents
|
| 23 |
+
|
| 24 |
+
- `aux_heads.pt`: State dictionary of the trained auxiliary heads (Layers 8, 16, 24).
|
| 25 |
+
- `config.json`: Configuration file specifying head layers and parameters.
|
| 26 |
+
- `calibration.json`: Calibrated thresholds for different accuracy levels.
|
| 27 |
+
|
| 28 |
+
## 🔧 How it Works
|
| 29 |
+
|
| 30 |
+
The auxiliary heads allow the model to exit early at intermediate layers if confidence thresholds are met. If the drafted tokens are not verified by the full model, they are corrected, ensuring 100% output match.
|