RainbowQTT commited on
Commit
98da3da
Β·
verified Β·
1 Parent(s): dfa0daa

refresh README + figs/

Browse files
Files changed (2) hide show
  1. README.md +66 -41
  2. figs/welcome.png +0 -0
README.md CHANGED
@@ -15,7 +15,7 @@ pipeline_tag: text-classification
15
  πŸ™ <a href="https://github.com/AI45Lab/AgentDoG"><b>GitHub</b></a>&nbsp&nbsp | &nbsp&nbsp
16
  πŸ€— <a href="https://huggingface.co/collections/AI45Research/agentdog15"><b>Hugging Face</b></a>&nbsp&nbsp | &nbsp&nbsp
17
  πŸ€– <a href="https://www.modelscope.cn/collections/Shanghai_AI_Laboratory/AgentDoG15"><b>ModelScope</b></a>&nbsp&nbsp | &nbsp&nbsp
18
- πŸ“„ <a href="TODO_TECHNICAL_REPORT_URL"><b>Technical Report</b></a>&nbsp&nbsp | &nbsp&nbsp
19
  πŸ“„ <a href="https://arxiv.org/abs/2604.02022"><b>ATBench</b></a>&nbsp&nbsp | &nbsp&nbsp
20
  🌐 <a href="https://ai45lab.github.io/AgentDoG/"><b>Demo</b></a>&nbsp&nbsp | &nbsp&nbsp
21
  πŸ“˜ <a href="https://example.com/AgentDoG-docs"><b>Documentation</b></a>
@@ -34,15 +34,15 @@ AgentDoG 1.5 is a lightweight and scalable agent safety alignment framework, bui
34
 
35
  | Model | Task | Parameters | Base model | Download |
36
  | --- | --- | ---: | --- | --- |
37
- | AgentDoG1.5-Unified-Qwen3.5-4B | Unified safety + diagnosis | 4B | Qwen3.5-4B | [ModelScope](https://modelscope.cn/models/Shanghai_AI_Laboratory/AgentDoG1.5-Unified-Qwen3.5-4B) |
38
- | AgentDoG1.5-Qwen3.5-0.8B | Coarse-grained moderation | 0.8B | Qwen3.5-0.8B | [ModelScope](https://modelscope.cn/models/Shanghai_AI_Laboratory/AgentDoG1.5-Qwen3.5-0.8B) |
39
- | AgentDoG1.5-Qwen3.5-2B | Coarse-grained moderation | 2B | Qwen3.5-2B | [ModelScope](https://modelscope.cn/models/Shanghai_AI_Laboratory/AgentDoG1.5-Qwen3.5-2B) |
40
- | AgentDoG1.5-Qwen3.5-4B | Coarse-grained moderation | 4B | Qwen3.5-4B | [ModelScope](https://modelscope.cn/models/Shanghai_AI_Laboratory/AgentDoG1.5-Qwen3.5-4B) |
41
- | AgentDoG1.5-Llama3.1-8B | Coarse-grained moderation | 8B | Llama3.1-8B | [ModelScope](https://modelscope.cn/models/Shanghai_AI_Laboratory/AgentDoG1.5-Llama3.1-8B) |
42
- | AgentDoG1.5-FG-Qwen3.5-0.8B | Fine-grained diagnosis | 0.8B | Qwen3.5-0.8B | [ModelScope](https://modelscope.cn/models/Shanghai_AI_Laboratory/AgentDoG1.5-FG-Qwen3.5-0.8B) |
43
- | AgentDoG1.5-FG-Qwen3.5-2B | Fine-grained diagnosis | 2B | Qwen3.5-2B | [ModelScope](https://modelscope.cn/models/Shanghai_AI_Laboratory/AgentDoG1.5-FG-Qwen3.5-2B) |
44
- | AgentDoG1.5-FG-Qwen3.5-4B | Fine-grained diagnosis | 4B | Qwen3.5-4B | [ModelScope](https://modelscope.cn/models/Shanghai_AI_Laboratory/AgentDoG1.5-FG-Qwen3.5-4B) |
45
- | AgentDoG1.5-FG-Llama3.1-8B | Fine-grained diagnosis | 8B | Llama3.1-8B | [ModelScope](https://modelscope.cn/models/Shanghai_AI_Laboratory/AgentDoG1.5-FG-Llama3.1-8B) |
46
 
47
  ## πŸ“Š Performance
48
 
@@ -139,18 +139,18 @@ trajectory = {
139
  Use this prompt with any coarse-grained model:
140
 
141
  ```text
142
- Shanghai_AI_Laboratory/AgentDoG1.5-Qwen3.5-0.8B
143
- Shanghai_AI_Laboratory/AgentDoG1.5-Qwen3.5-2B
144
- Shanghai_AI_Laboratory/AgentDoG1.5-Qwen3.5-4B
145
- Shanghai_AI_Laboratory/AgentDoG1.5-Llama3.1-8B
146
  ```
147
 
148
  ```python
149
  import torch
150
- from modelscope import AutoModelForCausalLM, AutoTokenizer
151
 
152
 
153
- model_name = "Shanghai_AI_Laboratory/AgentDoG1.5-Qwen3.5-4B"
154
 
155
  tokenizer = AutoTokenizer.from_pretrained(model_name)
156
  model = AutoModelForCausalLM.from_pretrained(
@@ -240,20 +240,20 @@ Expected output:
240
  Use this prompt with any fine-grained model:
241
 
242
  ```text
243
- Shanghai_AI_Laboratory/AgentDoG1.5-FG-Qwen3.5-0.8B
244
- Shanghai_AI_Laboratory/AgentDoG1.5-FG-Qwen3.5-2B
245
- Shanghai_AI_Laboratory/AgentDoG1.5-FG-Qwen3.5-4B
246
- Shanghai_AI_Laboratory/AgentDoG1.5-FG-Llama3.1-8B
247
  ```
248
 
249
  Fine-grained models are intended to diagnose unsafe trajectories along the three taxonomy dimensions. The prompt template is below.
250
 
251
  ```python
252
  import torch
253
- from modelscope import AutoModelForCausalLM, AutoTokenizer
254
 
255
 
256
- model_name = "Shanghai_AI_Laboratory/AgentDoG1.5-FG-Qwen3.5-4B"
257
 
258
  tokenizer = AutoTokenizer.from_pretrained(model_name)
259
  model = AutoModelForCausalLM.from_pretrained(
@@ -376,15 +376,15 @@ Risk Source: Inherent Agent/LLM Failures
376
  Use this prompt with:
377
 
378
  ```text
379
- Shanghai_AI_Laboratory/AgentDoG1.5-Unified-Qwen3.5-4B
380
  ```
381
 
382
  ```python
383
  import torch
384
- from modelscope import AutoModelForCausalLM, AutoTokenizer
385
 
386
 
387
- model_name = "Shanghai_AI_Laboratory/AgentDoG1.5-Unified-Qwen3.5-4B"
388
 
389
  tokenizer = AutoTokenizer.from_pretrained(model_name)
390
  model = AutoModelForCausalLM.from_pretrained(
@@ -514,30 +514,22 @@ Risk Source: Inherent Agent/LLM Failures
514
 
515
  Use a recent SGLang or vLLM build that supports the selected backbone. For Qwen3.5 checkpoints, use a version that supports `Qwen3_5ForConditionalGeneration`.
516
 
517
- To make SGLang / vLLM resolve `Shanghai_AI_Laboratory/...` IDs from ModelScope (instead of Hugging Face), export the following before launching:
518
-
519
- ```shell
520
- export VLLM_USE_MODELSCOPE=True
521
- export SGLANG_USE_MODELSCOPE=True
522
- ```
523
-
524
- Alternatively, pre-download the weights with `modelscope download` and pass the local snapshot path to `--model-path` / `vllm serve`.
525
 
526
  ### βš™οΈ SGLang
527
 
528
  ```shell
529
  python -m sglang.launch_server \
530
- --model-path Shanghai_AI_Laboratory/AgentDoG1.5-Qwen3.5-4B \
531
  --port 30000 \
532
  --context-length 16384
533
 
534
  python -m sglang.launch_server \
535
- --model-path Shanghai_AI_Laboratory/AgentDoG1.5-Unified-Qwen3.5-4B \
536
  --port 30000 \
537
  --context-length 16384
538
 
539
  python -m sglang.launch_server \
540
- --model-path Shanghai_AI_Laboratory/AgentDoG1.5-FG-Qwen3.5-4B \
541
  --port 30000 \
542
  --context-length 16384
543
  ```
@@ -545,15 +537,15 @@ python -m sglang.launch_server \
545
  ### πŸ–₯️ vLLM
546
 
547
  ```shell
548
- vllm serve Shanghai_AI_Laboratory/AgentDoG1.5-Qwen3.5-4B \
549
  --port 8000 \
550
  --max-model-len 16384
551
 
552
- vllm serve Shanghai_AI_Laboratory/AgentDoG1.5-Unified-Qwen3.5-4B \
553
  --port 8000 \
554
  --max-model-len 16384
555
 
556
- vllm serve Shanghai_AI_Laboratory/AgentDoG1.5-FG-Qwen3.5-4B \
557
  --port 8000 \
558
  --max-model-len 16384
559
  ```
@@ -569,7 +561,7 @@ client = OpenAI(
569
  base_url="http://localhost:8000/v1",
570
  )
571
 
572
- model_name = "Shanghai_AI_Laboratory/AgentDoG1.5-Qwen3.5-4B"
573
 
574
  # Use the coarse-grained, fine-grained, or unified prompt from above.
575
  chat_completion = client.chat.completions.create(
@@ -592,7 +584,40 @@ This project is released under the **Apache 2.0 License**.
592
 
593
  ## πŸ“– Citation
594
 
595
- Citation information will be added later.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
596
 
597
  ---
598
 
 
15
  πŸ™ <a href="https://github.com/AI45Lab/AgentDoG"><b>GitHub</b></a>&nbsp&nbsp | &nbsp&nbsp
16
  πŸ€— <a href="https://huggingface.co/collections/AI45Research/agentdog15"><b>Hugging Face</b></a>&nbsp&nbsp | &nbsp&nbsp
17
  πŸ€– <a href="https://www.modelscope.cn/collections/Shanghai_AI_Laboratory/AgentDoG15"><b>ModelScope</b></a>&nbsp&nbsp | &nbsp&nbsp
18
+ πŸ“„ <a href="https://arxiv.org/pdf/2605.29801"><b>Technical Report</b></a>&nbsp&nbsp | &nbsp&nbsp
19
  πŸ“„ <a href="https://arxiv.org/abs/2604.02022"><b>ATBench</b></a>&nbsp&nbsp | &nbsp&nbsp
20
  🌐 <a href="https://ai45lab.github.io/AgentDoG/"><b>Demo</b></a>&nbsp&nbsp | &nbsp&nbsp
21
  πŸ“˜ <a href="https://example.com/AgentDoG-docs"><b>Documentation</b></a>
 
34
 
35
  | Model | Task | Parameters | Base model | Download |
36
  | --- | --- | ---: | --- | --- |
37
+ | AgentDoG1.5-Unified-Qwen3.5-4B | Unified safety diagnosis | 4B | Qwen3.5-4B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-Unified-Qwen3.5-4B) |
38
+ | AgentDoG1.5-Qwen3.5-0.8B | Coarse-grained moderation | 0.8B | Qwen3.5-0.8B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-Qwen3.5-0.8B) |
39
+ | AgentDoG1.5-Qwen3.5-2B | Coarse-grained moderation | 2B | Qwen3.5-2B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-Qwen3.5-2B) |
40
+ | AgentDoG1.5-Qwen3.5-4B | Coarse-grained moderation | 4B | Qwen3.5-4B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-Qwen3.5-4B) |
41
+ | AgentDoG1.5-Llama3.1-8B | Coarse-grained moderation | 8B | Llama3.1-8B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-Llama3.1-8B) |
42
+ | AgentDoG1.5-FG-Qwen3.5-0.8B | Fine-grained diagnosis | 0.8B | Qwen3.5-0.8B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-FG-Qwen3.5-0.8B) |
43
+ | AgentDoG1.5-FG-Qwen3.5-2B | Fine-grained diagnosis | 2B | Qwen3.5-2B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-FG-Qwen3.5-2B) |
44
+ | AgentDoG1.5-FG-Qwen3.5-4B | Fine-grained diagnosis | 4B | Qwen3.5-4B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-FG-Qwen3.5-4B) |
45
+ | AgentDoG1.5-FG-Llama3.1-8B | Fine-grained diagnosis | 8B | Llama3.1-8B | [Hugging Face](https://huggingface.co/AI45Research/AgentDoG1.5-FG-Llama3.1-8B) |
46
 
47
  ## πŸ“Š Performance
48
 
 
139
  Use this prompt with any coarse-grained model:
140
 
141
  ```text
142
+ AI45Research/AgentDoG1.5-Qwen3.5-0.8B
143
+ AI45Research/AgentDoG1.5-Qwen3.5-2B
144
+ AI45Research/AgentDoG1.5-Qwen3.5-4B
145
+ AI45Research/AgentDoG1.5-Llama3.1-8B
146
  ```
147
 
148
  ```python
149
  import torch
150
+ from transformers import AutoModelForCausalLM, AutoTokenizer
151
 
152
 
153
+ model_name = "AI45Research/AgentDoG1.5-Qwen3.5-4B"
154
 
155
  tokenizer = AutoTokenizer.from_pretrained(model_name)
156
  model = AutoModelForCausalLM.from_pretrained(
 
240
  Use this prompt with any fine-grained model:
241
 
242
  ```text
243
+ AI45Research/AgentDoG1.5-FG-Qwen3.5-0.8B
244
+ AI45Research/AgentDoG1.5-FG-Qwen3.5-2B
245
+ AI45Research/AgentDoG1.5-FG-Qwen3.5-4B
246
+ AI45Research/AgentDoG1.5-FG-Llama3.1-8B
247
  ```
248
 
249
  Fine-grained models are intended to diagnose unsafe trajectories along the three taxonomy dimensions. The prompt template is below.
250
 
251
  ```python
252
  import torch
253
+ from transformers import AutoModelForCausalLM, AutoTokenizer
254
 
255
 
256
+ model_name = "AI45Research/AgentDoG1.5-FG-Qwen3.5-4B"
257
 
258
  tokenizer = AutoTokenizer.from_pretrained(model_name)
259
  model = AutoModelForCausalLM.from_pretrained(
 
376
  Use this prompt with:
377
 
378
  ```text
379
+ AI45Research/AgentDoG1.5-Unified-Qwen3.5-4B
380
  ```
381
 
382
  ```python
383
  import torch
384
+ from transformers import AutoModelForCausalLM, AutoTokenizer
385
 
386
 
387
+ model_name = "AI45Research/AgentDoG1.5-Unified-Qwen3.5-4B"
388
 
389
  tokenizer = AutoTokenizer.from_pretrained(model_name)
390
  model = AutoModelForCausalLM.from_pretrained(
 
514
 
515
  Use a recent SGLang or vLLM build that supports the selected backbone. For Qwen3.5 checkpoints, use a version that supports `Qwen3_5ForConditionalGeneration`.
516
 
 
 
 
 
 
 
 
 
517
 
518
  ### βš™οΈ SGLang
519
 
520
  ```shell
521
  python -m sglang.launch_server \
522
+ --model-path AI45Research/AgentDoG1.5-Qwen3.5-4B \
523
  --port 30000 \
524
  --context-length 16384
525
 
526
  python -m sglang.launch_server \
527
+ --model-path AI45Research/AgentDoG1.5-Unified-Qwen3.5-4B \
528
  --port 30000 \
529
  --context-length 16384
530
 
531
  python -m sglang.launch_server \
532
+ --model-path AI45Research/AgentDoG1.5-FG-Qwen3.5-4B \
533
  --port 30000 \
534
  --context-length 16384
535
  ```
 
537
  ### πŸ–₯️ vLLM
538
 
539
  ```shell
540
+ vllm serve AI45Research/AgentDoG1.5-Qwen3.5-4B \
541
  --port 8000 \
542
  --max-model-len 16384
543
 
544
+ vllm serve AI45Research/AgentDoG1.5-Unified-Qwen3.5-4B \
545
  --port 8000 \
546
  --max-model-len 16384
547
 
548
+ vllm serve AI45Research/AgentDoG1.5-FG-Qwen3.5-4B \
549
  --port 8000 \
550
  --max-model-len 16384
551
  ```
 
561
  base_url="http://localhost:8000/v1",
562
  )
563
 
564
+ model_name = "AI45Research/AgentDoG1.5-Qwen3.5-4B"
565
 
566
  # Use the coarse-grained, fine-grained, or unified prompt from above.
567
  chat_completion = client.chat.completions.create(
 
584
 
585
  ## πŸ“– Citation
586
 
587
+ If you use AgentDoG or ATBench in your research, please cite:
588
+
589
+ ```bibtex
590
+ @misc{liu2026agentdog15lightweightscalable,
591
+ title={AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security},
592
+ author={Dongrui Liu and Yu Li and Zhonghao Yang and Peng Wang and Guanxu Chen and Yuejin Xie and Qinghua Mao and Wanying Qu and Yanxu Zhu and Tianyi Zhou and Leitao Yuan and Zhijie Zheng and Qihao Lin and Yimin Wang and Haoyu Luo and Shuai Shao and Chen Qian and Qingyu Liu and Ling Tang and Ruiyang Qin and Qihan Ren and Junxiao Yang and Kun Wang and Zhiheng Xi and Linfeng Zhang and Ranjie Duan and Bo Zhang and Wenjie Wang and Wen Shen and Qiaosheng Zhang and Yan Teng and Chaochao Lu and Rui Mei and Man Li and Jialing Tao and Xi Lin and Tianhang Zheng and Yong Liu and Quanshi Zhang and Lei Zhu and Xingjun Ma and Junhua Liu and Hui Xue and Xiaoxiang Zuo and Xiangnan He and Chao Shen and Xianglong Liu and Minlie Huang and Jing Shao and Xia Hu},
593
+ year={2026},
594
+ eprint={2605.29801},
595
+ archivePrefix={arXiv},
596
+ primaryClass={cs.AI},
597
+ url={https://arxiv.org/abs/2605.29801},
598
+ }
599
+
600
+ @article{liu2026agentdog,
601
+ title={AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security},
602
+ author={Liu, Dongrui and Ren, Qihan and Qian, Chen and Shao, Shuai and Xie, Yuejin and Li, Yu and Yang, Zhonghao and Luo, Haoyu and Wang, Peng and Liu, Qingyu and others},
603
+ journal={arXiv preprint arXiv:2601.18491},
604
+ year={2026}
605
+ }
606
+
607
+ @article{li2026atbench,
608
+ title={ATBench: A Diverse and Realistic Trajectory Benchmark for Long-Horizon Agent Safety},
609
+ author={Li, Yu and Luo, Haoyu and Xie, Yuejin and Fu, Yuqian and Yang, Zhonghao and Shao, Shuai and Ren, Qihan and Qu, Wanying and Fu, Yanwei and Yang, Yujiu and others},
610
+ journal={arXiv preprint arXiv:2604.02022},
611
+ year={2026}
612
+ }
613
+
614
+ @misc{qian2026behind,
615
+ title={The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution},
616
+ author={Chen Qian and Peng Wang and Dongrui Liu and Junyao Yang and Dadi Guo and Ling Tang and Jilin Mei and Qihan Ren and Shuai Shao and Yong Liu and Jie Fu and Jing Shao and Xia Hu},
617
+ year={2026},
618
+ journal={arXiv preprint arXiv:2601.15075}
619
+ }
620
+ ```
621
 
622
  ---
623
 
figs/welcome.png CHANGED