refresh README + figs/
Browse files- README.md +66 -41
- figs/welcome.png +0 -0
README.md
CHANGED
|
@@ -15,7 +15,7 @@ pipeline_tag: text-classification
|
|
| 15 |
π <a href="https://github.com/AI45Lab/AgentDoG"><b>GitHub</b></a>   |   
|
| 16 |
π€ <a href="https://huggingface.co/collections/AI45Research/agentdog15"><b>Hugging Face</b></a>   |   
|
| 17 |
π€ <a href="https://www.modelscope.cn/collections/Shanghai_AI_Laboratory/AgentDoG15"><b>ModelScope</b></a>   |   
|
| 18 |
-
π <a href="
|
| 19 |
π <a href="https://arxiv.org/abs/2604.02022"><b>ATBench</b></a>   |   
|
| 20 |
π <a href="https://ai45lab.github.io/AgentDoG/"><b>Demo</b></a>   |   
|
| 21 |
π <a href="https://example.com/AgentDoG-docs"><b>Documentation</b></a>
|
|
@@ -34,15 +34,15 @@ AgentDoG 1.5 is a lightweight and scalable agent safety alignment framework, bui
|
|
| 34 |
|
| 35 |
| Model | Task | Parameters | Base model | Download |
|
| 36 |
| --- | --- | ---: | --- | --- |
|
| 37 |
-
| AgentDoG1.5-Unified-Qwen3.5-4B | Unified safety
|
| 38 |
-
| AgentDoG1.5-Qwen3.5-0.8B | Coarse-grained moderation | 0.8B | Qwen3.5-0.8B | [
|
| 39 |
-
| AgentDoG1.5-Qwen3.5-2B | Coarse-grained moderation | 2B | Qwen3.5-2B | [
|
| 40 |
-
| AgentDoG1.5-Qwen3.5-4B | Coarse-grained moderation | 4B | Qwen3.5-4B | [
|
| 41 |
-
| AgentDoG1.5-Llama3.1-8B | Coarse-grained moderation | 8B | Llama3.1-8B | [
|
| 42 |
-
| AgentDoG1.5-FG-Qwen3.5-0.8B | Fine-grained diagnosis | 0.8B | Qwen3.5-0.8B | [
|
| 43 |
-
| AgentDoG1.5-FG-Qwen3.5-2B | Fine-grained diagnosis | 2B | Qwen3.5-2B | [
|
| 44 |
-
| AgentDoG1.5-FG-Qwen3.5-4B | Fine-grained diagnosis | 4B | Qwen3.5-4B | [
|
| 45 |
-
| AgentDoG1.5-FG-Llama3.1-8B | Fine-grained diagnosis | 8B | Llama3.1-8B | [
|
| 46 |
|
| 47 |
## π Performance
|
| 48 |
|
|
@@ -139,18 +139,18 @@ trajectory = {
|
|
| 139 |
Use this prompt with any coarse-grained model:
|
| 140 |
|
| 141 |
```text
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
```
|
| 147 |
|
| 148 |
```python
|
| 149 |
import torch
|
| 150 |
-
from
|
| 151 |
|
| 152 |
|
| 153 |
-
model_name = "
|
| 154 |
|
| 155 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 156 |
model = AutoModelForCausalLM.from_pretrained(
|
|
@@ -240,20 +240,20 @@ Expected output:
|
|
| 240 |
Use this prompt with any fine-grained model:
|
| 241 |
|
| 242 |
```text
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
```
|
| 248 |
|
| 249 |
Fine-grained models are intended to diagnose unsafe trajectories along the three taxonomy dimensions. The prompt template is below.
|
| 250 |
|
| 251 |
```python
|
| 252 |
import torch
|
| 253 |
-
from
|
| 254 |
|
| 255 |
|
| 256 |
-
model_name = "
|
| 257 |
|
| 258 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 259 |
model = AutoModelForCausalLM.from_pretrained(
|
|
@@ -376,15 +376,15 @@ Risk Source: Inherent Agent/LLM Failures
|
|
| 376 |
Use this prompt with:
|
| 377 |
|
| 378 |
```text
|
| 379 |
-
|
| 380 |
```
|
| 381 |
|
| 382 |
```python
|
| 383 |
import torch
|
| 384 |
-
from
|
| 385 |
|
| 386 |
|
| 387 |
-
model_name = "
|
| 388 |
|
| 389 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 390 |
model = AutoModelForCausalLM.from_pretrained(
|
|
@@ -514,30 +514,22 @@ Risk Source: Inherent Agent/LLM Failures
|
|
| 514 |
|
| 515 |
Use a recent SGLang or vLLM build that supports the selected backbone. For Qwen3.5 checkpoints, use a version that supports `Qwen3_5ForConditionalGeneration`.
|
| 516 |
|
| 517 |
-
To make SGLang / vLLM resolve `Shanghai_AI_Laboratory/...` IDs from ModelScope (instead of Hugging Face), export the following before launching:
|
| 518 |
-
|
| 519 |
-
```shell
|
| 520 |
-
export VLLM_USE_MODELSCOPE=True
|
| 521 |
-
export SGLANG_USE_MODELSCOPE=True
|
| 522 |
-
```
|
| 523 |
-
|
| 524 |
-
Alternatively, pre-download the weights with `modelscope download` and pass the local snapshot path to `--model-path` / `vllm serve`.
|
| 525 |
|
| 526 |
### βοΈ SGLang
|
| 527 |
|
| 528 |
```shell
|
| 529 |
python -m sglang.launch_server \
|
| 530 |
-
--model-path
|
| 531 |
--port 30000 \
|
| 532 |
--context-length 16384
|
| 533 |
|
| 534 |
python -m sglang.launch_server \
|
| 535 |
-
--model-path
|
| 536 |
--port 30000 \
|
| 537 |
--context-length 16384
|
| 538 |
|
| 539 |
python -m sglang.launch_server \
|
| 540 |
-
--model-path
|
| 541 |
--port 30000 \
|
| 542 |
--context-length 16384
|
| 543 |
```
|
|
@@ -545,15 +537,15 @@ python -m sglang.launch_server \
|
|
| 545 |
### π₯οΈ vLLM
|
| 546 |
|
| 547 |
```shell
|
| 548 |
-
vllm serve
|
| 549 |
--port 8000 \
|
| 550 |
--max-model-len 16384
|
| 551 |
|
| 552 |
-
vllm serve
|
| 553 |
--port 8000 \
|
| 554 |
--max-model-len 16384
|
| 555 |
|
| 556 |
-
vllm serve
|
| 557 |
--port 8000 \
|
| 558 |
--max-model-len 16384
|
| 559 |
```
|
|
@@ -569,7 +561,7 @@ client = OpenAI(
|
|
| 569 |
base_url="http://localhost:8000/v1",
|
| 570 |
)
|
| 571 |
|
| 572 |
-
model_name = "
|
| 573 |
|
| 574 |
# Use the coarse-grained, fine-grained, or unified prompt from above.
|
| 575 |
chat_completion = client.chat.completions.create(
|
|
@@ -592,7 +584,40 @@ This project is released under the **Apache 2.0 License**.
|
|
| 592 |
|
| 593 |
## π Citation
|
| 594 |
|
| 595 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 596 |
|
| 597 |
---
|
| 598 |
|
|
|
|
| 15 |
π <a href="https://github.com/AI45Lab/AgentDoG"><b>GitHub</b></a>   |   
|
| 16 |
π€ <a href="https://huggingface.co/collections/AI45Research/agentdog15"><b>Hugging Face</b></a>   |   
|
| 17 |
π€ <a href="https://www.modelscope.cn/collections/Shanghai_AI_Laboratory/AgentDoG15"><b>ModelScope</b></a>   |   
|
| 18 |
+
π <a href="https://arxiv.org/pdf/2605.29801"><b>Technical Report</b></a>   |   
|
| 19 |
π <a href="https://arxiv.org/abs/2604.02022"><b>ATBench</b></a>   |   
|
| 20 |
π <a href="https://ai45lab.github.io/AgentDoG/"><b>Demo</b></a>   |   
|
| 21 |
π <a href="https://example.com/AgentDoG-docs"><b>Documentation</b></a>
|
|
|
|
| 34 |
|
| 35 |
| Model | Task | Parameters | Base model | Download |
|
| 36 |
| --- | --- | ---: | --- | --- |
|
| 37 |
+
| AgentDoG1.5-Unified-Qwen3.5-4B | Unified safety diagnosis | 4B | Qwen3.5-4B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-Unified-Qwen3.5-4B) |
|
| 38 |
+
| AgentDoG1.5-Qwen3.5-0.8B | Coarse-grained moderation | 0.8B | Qwen3.5-0.8B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-Qwen3.5-0.8B) |
|
| 39 |
+
| AgentDoG1.5-Qwen3.5-2B | Coarse-grained moderation | 2B | Qwen3.5-2B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-Qwen3.5-2B) |
|
| 40 |
+
| AgentDoG1.5-Qwen3.5-4B | Coarse-grained moderation | 4B | Qwen3.5-4B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-Qwen3.5-4B) |
|
| 41 |
+
| AgentDoG1.5-Llama3.1-8B | Coarse-grained moderation | 8B | Llama3.1-8B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-Llama3.1-8B) |
|
| 42 |
+
| AgentDoG1.5-FG-Qwen3.5-0.8B | Fine-grained diagnosis | 0.8B | Qwen3.5-0.8B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-FG-Qwen3.5-0.8B) |
|
| 43 |
+
| AgentDoG1.5-FG-Qwen3.5-2B | Fine-grained diagnosis | 2B | Qwen3.5-2B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-FG-Qwen3.5-2B) |
|
| 44 |
+
| AgentDoG1.5-FG-Qwen3.5-4B | Fine-grained diagnosis | 4B | Qwen3.5-4B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-FG-Qwen3.5-4B) |
|
| 45 |
+
| AgentDoG1.5-FG-Llama3.1-8B | Fine-grained diagnosis | 8B | Llama3.1-8B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-FG-Llama3.1-8B) |
|
| 46 |
|
| 47 |
## π Performance
|
| 48 |
|
|
|
|
| 139 |
Use this prompt with any coarse-grained model:
|
| 140 |
|
| 141 |
```text
|
| 142 |
+
AI45Research/AgentDoG1.5-Qwen3.5-0.8B
|
| 143 |
+
AI45Research/AgentDoG1.5-Qwen3.5-2B
|
| 144 |
+
AI45Research/AgentDoG1.5-Qwen3.5-4B
|
| 145 |
+
AI45Research/AgentDoG1.5-Llama3.1-8B
|
| 146 |
```
|
| 147 |
|
| 148 |
```python
|
| 149 |
import torch
|
| 150 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 151 |
|
| 152 |
|
| 153 |
+
model_name = "AI45Research/AgentDoG1.5-Qwen3.5-4B"
|
| 154 |
|
| 155 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 156 |
model = AutoModelForCausalLM.from_pretrained(
|
|
|
|
| 240 |
Use this prompt with any fine-grained model:
|
| 241 |
|
| 242 |
```text
|
| 243 |
+
AI45Research/AgentDoG1.5-FG-Qwen3.5-0.8B
|
| 244 |
+
AI45Research/AgentDoG1.5-FG-Qwen3.5-2B
|
| 245 |
+
AI45Research/AgentDoG1.5-FG-Qwen3.5-4B
|
| 246 |
+
AI45Research/AgentDoG1.5-FG-Llama3.1-8B
|
| 247 |
```
|
| 248 |
|
| 249 |
Fine-grained models are intended to diagnose unsafe trajectories along the three taxonomy dimensions. The prompt template is below.
|
| 250 |
|
| 251 |
```python
|
| 252 |
import torch
|
| 253 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 254 |
|
| 255 |
|
| 256 |
+
model_name = "AI45Research/AgentDoG1.5-FG-Qwen3.5-4B"
|
| 257 |
|
| 258 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 259 |
model = AutoModelForCausalLM.from_pretrained(
|
|
|
|
| 376 |
Use this prompt with:
|
| 377 |
|
| 378 |
```text
|
| 379 |
+
AI45Research/AgentDoG1.5-Unified-Qwen3.5-4B
|
| 380 |
```
|
| 381 |
|
| 382 |
```python
|
| 383 |
import torch
|
| 384 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 385 |
|
| 386 |
|
| 387 |
+
model_name = "AI45Research/AgentDoG1.5-Unified-Qwen3.5-4B"
|
| 388 |
|
| 389 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 390 |
model = AutoModelForCausalLM.from_pretrained(
|
|
|
|
| 514 |
|
| 515 |
Use a recent SGLang or vLLM build that supports the selected backbone. For Qwen3.5 checkpoints, use a version that supports `Qwen3_5ForConditionalGeneration`.
|
| 516 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 517 |
|
| 518 |
### βοΈ SGLang
|
| 519 |
|
| 520 |
```shell
|
| 521 |
python -m sglang.launch_server \
|
| 522 |
+
--model-path AI45Research/AgentDoG1.5-Qwen3.5-4B \
|
| 523 |
--port 30000 \
|
| 524 |
--context-length 16384
|
| 525 |
|
| 526 |
python -m sglang.launch_server \
|
| 527 |
+
--model-path AI45Research/AgentDoG1.5-Unified-Qwen3.5-4B \
|
| 528 |
--port 30000 \
|
| 529 |
--context-length 16384
|
| 530 |
|
| 531 |
python -m sglang.launch_server \
|
| 532 |
+
--model-path AI45Research/AgentDoG1.5-FG-Qwen3.5-4B \
|
| 533 |
--port 30000 \
|
| 534 |
--context-length 16384
|
| 535 |
```
|
|
|
|
| 537 |
### π₯οΈ vLLM
|
| 538 |
|
| 539 |
```shell
|
| 540 |
+
vllm serve AI45Research/AgentDoG1.5-Qwen3.5-4B \
|
| 541 |
--port 8000 \
|
| 542 |
--max-model-len 16384
|
| 543 |
|
| 544 |
+
vllm serve AI45Research/AgentDoG1.5-Unified-Qwen3.5-4B \
|
| 545 |
--port 8000 \
|
| 546 |
--max-model-len 16384
|
| 547 |
|
| 548 |
+
vllm serve AI45Research/AgentDoG1.5-FG-Qwen3.5-4B \
|
| 549 |
--port 8000 \
|
| 550 |
--max-model-len 16384
|
| 551 |
```
|
|
|
|
| 561 |
base_url="http://localhost:8000/v1",
|
| 562 |
)
|
| 563 |
|
| 564 |
+
model_name = "AI45Research/AgentDoG1.5-Qwen3.5-4B"
|
| 565 |
|
| 566 |
# Use the coarse-grained, fine-grained, or unified prompt from above.
|
| 567 |
chat_completion = client.chat.completions.create(
|
|
|
|
| 584 |
|
| 585 |
## π Citation
|
| 586 |
|
| 587 |
+
If you use AgentDoG or ATBench in your research, please cite:
|
| 588 |
+
|
| 589 |
+
```bibtex
|
| 590 |
+
@misc{liu2026agentdog15lightweightscalable,
|
| 591 |
+
title={AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security},
|
| 592 |
+
author={Dongrui Liu and Yu Li and Zhonghao Yang and Peng Wang and Guanxu Chen and Yuejin Xie and Qinghua Mao and Wanying Qu and Yanxu Zhu and Tianyi Zhou and Leitao Yuan and Zhijie Zheng and Qihao Lin and Yimin Wang and Haoyu Luo and Shuai Shao and Chen Qian and Qingyu Liu and Ling Tang and Ruiyang Qin and Qihan Ren and Junxiao Yang and Kun Wang and Zhiheng Xi and Linfeng Zhang and Ranjie Duan and Bo Zhang and Wenjie Wang and Wen Shen and Qiaosheng Zhang and Yan Teng and Chaochao Lu and Rui Mei and Man Li and Jialing Tao and Xi Lin and Tianhang Zheng and Yong Liu and Quanshi Zhang and Lei Zhu and Xingjun Ma and Junhua Liu and Hui Xue and Xiaoxiang Zuo and Xiangnan He and Chao Shen and Xianglong Liu and Minlie Huang and Jing Shao and Xia Hu},
|
| 593 |
+
year={2026},
|
| 594 |
+
eprint={2605.29801},
|
| 595 |
+
archivePrefix={arXiv},
|
| 596 |
+
primaryClass={cs.AI},
|
| 597 |
+
url={https://arxiv.org/abs/2605.29801},
|
| 598 |
+
}
|
| 599 |
+
|
| 600 |
+
@article{liu2026agentdog,
|
| 601 |
+
title={AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security},
|
| 602 |
+
author={Liu, Dongrui and Ren, Qihan and Qian, Chen and Shao, Shuai and Xie, Yuejin and Li, Yu and Yang, Zhonghao and Luo, Haoyu and Wang, Peng and Liu, Qingyu and others},
|
| 603 |
+
journal={arXiv preprint arXiv:2601.18491},
|
| 604 |
+
year={2026}
|
| 605 |
+
}
|
| 606 |
+
|
| 607 |
+
@article{li2026atbench,
|
| 608 |
+
title={ATBench: A Diverse and Realistic Trajectory Benchmark for Long-Horizon Agent Safety},
|
| 609 |
+
author={Li, Yu and Luo, Haoyu and Xie, Yuejin and Fu, Yuqian and Yang, Zhonghao and Shao, Shuai and Ren, Qihan and Qu, Wanying and Fu, Yanwei and Yang, Yujiu and others},
|
| 610 |
+
journal={arXiv preprint arXiv:2604.02022},
|
| 611 |
+
year={2026}
|
| 612 |
+
}
|
| 613 |
+
|
| 614 |
+
@misc{qian2026behind,
|
| 615 |
+
title={The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution},
|
| 616 |
+
author={Chen Qian and Peng Wang and Dongrui Liu and Junyao Yang and Dadi Guo and Ling Tang and Jilin Mei and Qihan Ren and Shuai Shao and Yong Liu and Jie Fu and Jing Shao and Xia Hu},
|
| 617 |
+
year={2026},
|
| 618 |
+
journal={arXiv preprint arXiv:2601.15075}
|
| 619 |
+
}
|
| 620 |
+
```
|
| 621 |
|
| 622 |
---
|
| 623 |
|
figs/welcome.png
CHANGED
|
|