|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- safety |
|
|
- tool-use |
|
|
- guardrail |
|
|
- agents |
|
|
--- |
|
|
|
|
|
# TS-Guard |
|
|
|
|
|
TS-Guard is a guardrail model for step-level tool invocation safety detection, introduced in the paper [ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback](https://huggingface.co/papers/2601.10156). |
|
|
|
|
|
TS-Guard is trained via reinforcement learning with a multi-task reward scheme tailored for agent security, enabling identifying harmful user requests and attack vectors in agent-environment interaction logs, detecting unsafe tool invocation before execution, and providing interpretable analysis and reasoning process. |
|
|
|
|
|
## Resources |
|
|
- **Paper:** [ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback](https://huggingface.co/papers/2601.10156) |
|
|
- **Repository:** [GitHub - MurrayTom/ToolSafe](https://github.com/MurrayTom/ToolSafe) |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find our work helpful, please consider citing it: |
|
|
|
|
|
```bibtex |
|
|
@article{mou2026toolsafe, |
|
|
title={ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback}, |
|
|
author={Mou, Yutao and Xue, Zhangchi and Li, Lijun and Liu, Peiyang and Zhang, Shikun and Ye, Wei and Shao, Jing}, |
|
|
journal={arXiv preprint arXiv:2601.10156}, |
|
|
year={2026} |
|
|
} |
|
|
``` |