Alibaba-AAIG
/

SEAGLE

Safetensors

Model card Files Files and versions

xet

Community

Zip Ye commited on Apr 11

Commit

2df16de

1 Parent(s): fa1aa1c

Update README.md

Browse files

Files changed (1) hide show

README.md +11 -9

README.md CHANGED Viewed

@@ -2,7 +2,7 @@
   <img src="assets/logo.png" alt="SEAGLE Logo" width="250"/>
 </div>
-# SEAGLE: Safe-Aware EAGLE
 **SEAGLE** is a safety-aware speculative decoding policy based on [SGLang](https://github.com/sgl-project/sglang). It embeds a lightweight probe model into the draft loop of [EAGLE-3](https://github.com/SafeAILab/EAGLE) speculative decoding, performs real-time safety monitoring on each decoding step, dynamically adjusts draft tokens, and triggers a fallback mechanism when unsafe content is continuously detected.
@@ -46,7 +46,7 @@ The designed **safety mechanism** is embedded within each round of speculative d
 ## 🚀 2. Quick Start
-We have open-sourced the draft model and probe for [Qwen3-235B-A22B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507). You can download it along with our compatible [draft](https://www.modelscope.cn/models/Alibaba-AAIG/SEAGLE/tree/master/draft_probe_suite/draft_model) and [probe](https://www.modelscope.cn/models/Alibaba-AAIG/SEAGLE/tree/master/draft_probe_suite/probe) models to experience safe inference.
 ### 📦 2.1 Install Dependencies
@@ -75,7 +75,7 @@ from sglang.srt.server_args import ServerArgs
 from sglang.srt.entrypoints.http_server import launch_server as _launch_server
 # =========================================================
-# Launch SGlang Server with Safe-Aware Eagle3 Decoding
 # =========================================================
 MODEL_PATH = "your_qwen3_235b_a22b_instruct_2507_path"
 DRAFT_MODEL_PATH = "draft_probe_suite/draft_model"
@@ -243,9 +243,9 @@ We begin by evaluating the acceleration performance of our draft models, encompa
 | **Ours (Pre-trained)** | **2.7 / 734 (1.51x)** | **3.3 / 848 (1.73x)** | **3.1 / 617 (1.41x)** | **2.8 / 706 (1.51x)** | **4.4 / 1083 (2.23x)** | 3.2 / 637 (1.46x) | **4.4 / 1084 (2.32x)** | **2.9 / 691 (1.47x)** | **2.7 / 702 (1.45x)** | **4.1 / 1093 (2.23x)** |
 | Ours (After Joint-train) | 2.42 / 665 (1.37x) | 3.33 / 870 (1.77x) | 2.8 / 564 (1.29x) | 2.5 / 638 (1.36x) | 4.15 / 1016 (2.09x) | 2.85 / 591 (1.36x) | 4.30 / 1070 (2.29x) | 2.7 / 630 (1.34x) | 2.5 / 650 (1.34x) | 3.85 / 1030 (2.1x) |
-> **Note:** Our pre-trained draft model can be found [here](https://www.modelscope.cn/models/Alibaba-AAIG/SEAGLE/tree/master/draft_probe_suite/pretrained_draft_model). Compared to the [Meituan](https://modelscope.cn/models/lmsys/SGLang-EAGLE3-Qwen3-235B-A22B-Instruct-2507-SpecForge-Meituan) version, our Eagle Head has undergone accelerated training specifically for Chinese. The pre-trained version can be used standalone as an Eagle Head for Qwen3-235B-A22B-Instruct-2507, delivering outstanding acceleration performance in both Chinese and English.
-**Launch with Standard SGLang Command:**
 ```bash
 export SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1
@@ -281,9 +281,11 @@ Evaluate the probe's impact on normal chatting data (query safety & response saf
 | :--- | :---: | :---: | :---: |
 | FuseChat-Mixture | 50,000 | 0.99506 | 0.00494 |
-### ⚖️ 3.3 Utility and Safety
-The trained probe is integrated into the Eagle3 decoding pipeline. Using an SGLang & Single Request configuration, the general utility and security of the SafeAware decoding strategy are evaluated.
 #### (1) Utility Performance
@@ -312,8 +314,8 @@ Safety scores are evaluated based on the discriminative reward model (DRM), gene
 | 📎 [Chinese: 100 High-Risk](assets/valuesTest_zh_hard_100.jsonl) | DRM Score | 0.43 | 0.49 | **0.83** |
 | | QwQ Score | 0.70 | 0.70 | **0.92** |
 | | GRM Score | 0.23 | 0.31 | **0.81** |
-| 📊 [English Log](assets/GRM_judge_log_en.xlsx) | Evaluation | ✅ | - | ✅ |
-| 📊 [Chinese Log](assets/GRM_judge_log_zh.xlsx) | Evaluation | ✅ | - | ✅ |
 ---

   <img src="assets/logo.png" alt="SEAGLE Logo" width="250"/>
 </div>
+# SEAGLE: Safety-Aware EAGLE
 **SEAGLE** is a safety-aware speculative decoding policy based on [SGLang](https://github.com/sgl-project/sglang). It embeds a lightweight probe model into the draft loop of [EAGLE-3](https://github.com/SafeAILab/EAGLE) speculative decoding, performs real-time safety monitoring on each decoding step, dynamically adjusts draft tokens, and triggers a fallback mechanism when unsafe content is continuously detected.
 ## 🚀 2. Quick Start
+We have open-sourced the draft model and probe for [Qwen3-235B-A22B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507). You can download it along with our compatible [draft](https://huggingface.co/Alibaba-AAIG/SEAGLE/tree/main/draft_probe_suite/draft_model) and [probe](https://huggingface.co/Alibaba-AAIG/SEAGLE/tree/main/draft_probe_suite/probe) models to experience safe inference.
 ### 📦 2.1 Install Dependencies
 from sglang.srt.entrypoints.http_server import launch_server as _launch_server
 # =========================================================
+# Launch SGlang Server with Safety-Aware Eagle3 Decoding
 # =========================================================
 MODEL_PATH = "your_qwen3_235b_a22b_instruct_2507_path"
 DRAFT_MODEL_PATH = "draft_probe_suite/draft_model"
 | **Ours (Pre-trained)** | **2.7 / 734 (1.51x)** | **3.3 / 848 (1.73x)** | **3.1 / 617 (1.41x)** | **2.8 / 706 (1.51x)** | **4.4 / 1083 (2.23x)** | 3.2 / 637 (1.46x) | **4.4 / 1084 (2.32x)** | **2.9 / 691 (1.47x)** | **2.7 / 702 (1.45x)** | **4.1 / 1093 (2.23x)** |
 | Ours (After Joint-train) | 2.42 / 665 (1.37x) | 3.33 / 870 (1.77x) | 2.8 / 564 (1.29x) | 2.5 / 638 (1.36x) | 4.15 / 1016 (2.09x) | 2.85 / 591 (1.36x) | 4.30 / 1070 (2.29x) | 2.7 / 630 (1.34x) | 2.5 / 650 (1.34x) | 3.85 / 1030 (2.1x) |
+> **Note:** Our pre-trained draft model can be found [here](https://huggingface.co/Alibaba-AAIG/SEAGLE/tree/main/draft_probe_suite/pretrained_draft_model). Compared to the [Meituan](https://modelscope.cn/models/lmsys/SGLang-EAGLE3-Qwen3-235B-A22B-Instruct-2507-SpecForge-Meituan) version, our Eagle Head has undergone accelerated training specifically for Chinese. The pre-trained version can be used standalone as an Eagle Head for Qwen3-235B-A22B-Instruct-2507, delivering outstanding acceleration performance in both Chinese and English.
+**Launch with Standard SGLang CLI:**
 ```bash
 export SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1
 | :--- | :---: | :---: | :---: |
 | FuseChat-Mixture | 50,000 | 0.99506 | 0.00494 |
+> **Note:** Even if the probe occasionally produces false positives, the safety-aware speculative decoding mechanism still ensures that the generated responses are meaningful and valuable.
+### ⚖️ 3.3 End-to-End Utility and Safety
+The trained probe is integrated into the Eagle3 decoding pipeline. We evaluate the end-to-end utility and safety of the SafeAware decoding strategy using an SGLang single-request configuration.
 #### (1) Utility Performance
 | 📎 [Chinese: 100 High-Risk](assets/valuesTest_zh_hard_100.jsonl) | DRM Score | 0.43 | 0.49 | **0.83** |
 | | QwQ Score | 0.70 | 0.70 | **0.92** |
 | | GRM Score | 0.23 | 0.31 | **0.81** |
+| 📊 [English Log](assets/GRM_judge_log_en.xlsx) | Logs | ✅ | - | ✅ |
+| 📊 [Chinese Log](assets/GRM_judge_log_zh.xlsx) | Logs | ✅ | - | ✅ |
 ---