Zip Ye commited on
Commit Β·
2df16de
1
Parent(s): fa1aa1c
Update README.md
Browse files
README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
<img src="assets/logo.png" alt="SEAGLE Logo" width="250"/>
|
| 3 |
</div>
|
| 4 |
|
| 5 |
-
# SEAGLE:
|
| 6 |
|
| 7 |
**SEAGLE** is a safety-aware speculative decoding policy based on [SGLang](https://github.com/sgl-project/sglang). It embeds a lightweight probe model into the draft loop of [EAGLE-3](https://github.com/SafeAILab/EAGLE) speculative decoding, performs real-time safety monitoring on each decoding step, dynamically adjusts draft tokens, and triggers a fallback mechanism when unsafe content is continuously detected.
|
| 8 |
|
|
@@ -46,7 +46,7 @@ The designed **safety mechanism** is embedded within each round of speculative d
|
|
| 46 |
|
| 47 |
## π 2. Quick Start
|
| 48 |
|
| 49 |
-
We have open-sourced the draft model and probe for [Qwen3-235B-A22B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507). You can download it along with our compatible [draft](https://
|
| 50 |
|
| 51 |
### π¦ 2.1 Install Dependencies
|
| 52 |
|
|
@@ -75,7 +75,7 @@ from sglang.srt.server_args import ServerArgs
|
|
| 75 |
from sglang.srt.entrypoints.http_server import launch_server as _launch_server
|
| 76 |
|
| 77 |
# =========================================================
|
| 78 |
-
# Launch SGlang Server with
|
| 79 |
# =========================================================
|
| 80 |
MODEL_PATH = "your_qwen3_235b_a22b_instruct_2507_path"
|
| 81 |
DRAFT_MODEL_PATH = "draft_probe_suite/draft_model"
|
|
@@ -243,9 +243,9 @@ We begin by evaluating the acceleration performance of our draft models, encompa
|
|
| 243 |
| **Ours (Pre-trained)** | **2.7 / 734 (1.51x)** | **3.3 / 848 (1.73x)** | **3.1 / 617 (1.41x)** | **2.8 / 706 (1.51x)** | **4.4 / 1083 (2.23x)** | 3.2 / 637 (1.46x) | **4.4 / 1084 (2.32x)** | **2.9 / 691 (1.47x)** | **2.7 / 702 (1.45x)** | **4.1 / 1093 (2.23x)** |
|
| 244 |
| Ours (After Joint-train) | 2.42 / 665 (1.37x) | 3.33 / 870 (1.77x) | 2.8 / 564 (1.29x) | 2.5 / 638 (1.36x) | 4.15 / 1016 (2.09x) | 2.85 / 591 (1.36x) | 4.30 / 1070 (2.29x) | 2.7 / 630 (1.34x) | 2.5 / 650 (1.34x) | 3.85 / 1030 (2.1x) |
|
| 245 |
|
| 246 |
-
> **Note:** Our pre-trained draft model can be found [here](https://
|
| 247 |
|
| 248 |
-
**Launch with Standard SGLang
|
| 249 |
|
| 250 |
```bash
|
| 251 |
export SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1
|
|
@@ -281,9 +281,11 @@ Evaluate the probe's impact on normal chatting data (query safety & response saf
|
|
| 281 |
| :--- | :---: | :---: | :---: |
|
| 282 |
| FuseChat-Mixture | 50,000 | 0.99506 | 0.00494 |
|
| 283 |
|
| 284 |
-
|
| 285 |
|
| 286 |
-
|
|
|
|
|
|
|
| 287 |
|
| 288 |
#### (1) Utility Performance
|
| 289 |
|
|
@@ -312,8 +314,8 @@ Safety scores are evaluated based on the discriminative reward model (DRM), gene
|
|
| 312 |
| π [Chinese: 100 High-Risk](assets/valuesTest_zh_hard_100.jsonl) | DRM Score | 0.43 | 0.49 | **0.83** |
|
| 313 |
| | QwQ Score | 0.70 | 0.70 | **0.92** |
|
| 314 |
| | GRM Score | 0.23 | 0.31 | **0.81** |
|
| 315 |
-
| π [English Log](assets/GRM_judge_log_en.xlsx) |
|
| 316 |
-
| π [Chinese Log](assets/GRM_judge_log_zh.xlsx) |
|
| 317 |
|
| 318 |
---
|
| 319 |
|
|
|
|
| 2 |
<img src="assets/logo.png" alt="SEAGLE Logo" width="250"/>
|
| 3 |
</div>
|
| 4 |
|
| 5 |
+
# SEAGLE: Safety-Aware EAGLE
|
| 6 |
|
| 7 |
**SEAGLE** is a safety-aware speculative decoding policy based on [SGLang](https://github.com/sgl-project/sglang). It embeds a lightweight probe model into the draft loop of [EAGLE-3](https://github.com/SafeAILab/EAGLE) speculative decoding, performs real-time safety monitoring on each decoding step, dynamically adjusts draft tokens, and triggers a fallback mechanism when unsafe content is continuously detected.
|
| 8 |
|
|
|
|
| 46 |
|
| 47 |
## π 2. Quick Start
|
| 48 |
|
| 49 |
+
We have open-sourced the draft model and probe for [Qwen3-235B-A22B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507). You can download it along with our compatible [draft](https://huggingface.co/Alibaba-AAIG/SEAGLE/tree/main/draft_probe_suite/draft_model) and [probe](https://huggingface.co/Alibaba-AAIG/SEAGLE/tree/main/draft_probe_suite/probe) models to experience safe inference.
|
| 50 |
|
| 51 |
### π¦ 2.1 Install Dependencies
|
| 52 |
|
|
|
|
| 75 |
from sglang.srt.entrypoints.http_server import launch_server as _launch_server
|
| 76 |
|
| 77 |
# =========================================================
|
| 78 |
+
# Launch SGlang Server with Safety-Aware Eagle3 Decoding
|
| 79 |
# =========================================================
|
| 80 |
MODEL_PATH = "your_qwen3_235b_a22b_instruct_2507_path"
|
| 81 |
DRAFT_MODEL_PATH = "draft_probe_suite/draft_model"
|
|
|
|
| 243 |
| **Ours (Pre-trained)** | **2.7 / 734 (1.51x)** | **3.3 / 848 (1.73x)** | **3.1 / 617 (1.41x)** | **2.8 / 706 (1.51x)** | **4.4 / 1083 (2.23x)** | 3.2 / 637 (1.46x) | **4.4 / 1084 (2.32x)** | **2.9 / 691 (1.47x)** | **2.7 / 702 (1.45x)** | **4.1 / 1093 (2.23x)** |
|
| 244 |
| Ours (After Joint-train) | 2.42 / 665 (1.37x) | 3.33 / 870 (1.77x) | 2.8 / 564 (1.29x) | 2.5 / 638 (1.36x) | 4.15 / 1016 (2.09x) | 2.85 / 591 (1.36x) | 4.30 / 1070 (2.29x) | 2.7 / 630 (1.34x) | 2.5 / 650 (1.34x) | 3.85 / 1030 (2.1x) |
|
| 245 |
|
| 246 |
+
> **Note:** Our pre-trained draft model can be found [here](https://huggingface.co/Alibaba-AAIG/SEAGLE/tree/main/draft_probe_suite/pretrained_draft_model). Compared to the [Meituan](https://modelscope.cn/models/lmsys/SGLang-EAGLE3-Qwen3-235B-A22B-Instruct-2507-SpecForge-Meituan) version, our Eagle Head has undergone accelerated training specifically for Chinese. The pre-trained version can be used standalone as an Eagle Head for Qwen3-235B-A22B-Instruct-2507, delivering outstanding acceleration performance in both Chinese and English.
|
| 247 |
|
| 248 |
+
**Launch with Standard SGLang CLI:**
|
| 249 |
|
| 250 |
```bash
|
| 251 |
export SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1
|
|
|
|
| 281 |
| :--- | :---: | :---: | :---: |
|
| 282 |
| FuseChat-Mixture | 50,000 | 0.99506 | 0.00494 |
|
| 283 |
|
| 284 |
+
> **Note:** Even if the probe occasionally produces false positives, the safety-aware speculative decoding mechanism still ensures that the generated responses are meaningful and valuable.
|
| 285 |
|
| 286 |
+
### βοΈ 3.3 End-to-End Utility and Safety
|
| 287 |
+
|
| 288 |
+
The trained probe is integrated into the Eagle3 decoding pipeline. We evaluate the end-to-end utility and safety of the SafeAware decoding strategy using an SGLang single-request configuration.
|
| 289 |
|
| 290 |
#### (1) Utility Performance
|
| 291 |
|
|
|
|
| 314 |
| π [Chinese: 100 High-Risk](assets/valuesTest_zh_hard_100.jsonl) | DRM Score | 0.43 | 0.49 | **0.83** |
|
| 315 |
| | QwQ Score | 0.70 | 0.70 | **0.92** |
|
| 316 |
| | GRM Score | 0.23 | 0.31 | **0.81** |
|
| 317 |
+
| π [English Log](assets/GRM_judge_log_en.xlsx) | Logs | β
| - | β
|
|
| 318 |
+
| π [Chinese Log](assets/GRM_judge_log_zh.xlsx) | Logs | β
| - | β
|
|
| 319 |
|
| 320 |
---
|
| 321 |
|