Update README.md
Browse files
README.md
CHANGED
|
@@ -3,7 +3,7 @@ license: mit
|
|
| 3 |
---
|
| 4 |
# Model Information
|
| 5 |
|
| 6 |
-
This repository provides the ShieldAgent model, a fine-tuned safety judgment model for assessing behavioral safety of LLM agents and generating detailed explanations
|
| 7 |
|
| 8 |
# Uses
|
| 9 |
|
|
|
|
| 3 |
---
|
| 4 |
# Model Information
|
| 5 |
|
| 6 |
+
This repository provides the ShieldAgent model, a fine-tuned safety judgment model for assessing behavioral safety of LLM agents and generating detailed explanations, applied in [Agent-SafetyBench](https://arxiv.org/pdf/2412.14470). ShieldAgent is initialized from [Qwen-2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) and trained on 4000 agent interaction records (including tool calling requests and results) with manual labels and analyses generated by GPT-4o. This model achieves an accuracy of 91.5% on a test set of 200 interaction records from Gemini-1.5-Flash, significantly surpassing GPT-4o, which attains an accuracy of 75.5% on the same test set. This demonstrates ShieldAgent's strong performance on agent behavioral safety judgment. Please refer to our [Github Repository](https://github.com/thu-coai/Agent-SafetyBench) for more detailed information.
|
| 7 |
|
| 8 |
# Uses
|
| 9 |
|