Safetensors
qwen2
nonstopfor commited on
Commit
a62f841
·
verified ·
1 Parent(s): 0e2ce4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -3,7 +3,7 @@ license: mit
3
  ---
4
  # Model Information
5
 
6
- This repository provides the ShieldAgent model, a fine-tuned safety judgment model for assessing behavioral safety of LLM agents and generating detailed explanations ([paper link](https://arxiv.org/pdf/2412.14470)). ShieldAgent is initialized from [Qwen-2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) and trained on 4000 agent interaction records with manual labels and analyses generated by GPT-4o. This model achieves an accuracy of 91.5% on a test set of 200 interaction records from Gemini-1.5-Flash, significantly surpassing GPT-4o, which attains an accuracy of 75.5% on the same test set. This demonstrates ShieldAgent's strong performance on agent behavioral safety judgment. Please refer to our [Github Repository](https://github.com/thu-coai/Agent-SafetyBench) for more detailed information.
7
 
8
  # Uses
9
 
 
3
  ---
4
  # Model Information
5
 
6
+ This repository provides the ShieldAgent model, a fine-tuned safety judgment model for assessing behavioral safety of LLM agents and generating detailed explanations, applied in [Agent-SafetyBench](https://arxiv.org/pdf/2412.14470). ShieldAgent is initialized from [Qwen-2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) and trained on 4000 agent interaction records (including tool calling requests and results) with manual labels and analyses generated by GPT-4o. This model achieves an accuracy of 91.5% on a test set of 200 interaction records from Gemini-1.5-Flash, significantly surpassing GPT-4o, which attains an accuracy of 75.5% on the same test set. This demonstrates ShieldAgent's strong performance on agent behavioral safety judgment. Please refer to our [Github Repository](https://github.com/thu-coai/Agent-SafetyBench) for more detailed information.
7
 
8
  # Uses
9