TS-Guard / README.md

nielsr HF Staff

Improve model card: Add metadata, paper, and code links

77834eb verified 2 days ago

preview code

raw

history blame

1.64 kB

metadata

license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
  - safety
  - tool-use
  - guardrail
  - agents

TS-Guard

TS-Guard is a guardrail model for step-level tool invocation safety detection, introduced in the paper ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback.

TS-Guard is trained via reinforcement learning with a multi-task reward scheme tailored for agent security, enabling identifying harmful user requests and attack vectors in agent-environment interaction logs, detecting unsafe tool invocation before execution, and providing interpretable analysis and reasoning process.

Resources

Paper: ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
Repository: GitHub - MurrayTom/ToolSafe

Citation

If you find our work helpful, please consider citing it:

@article{mou2026toolsafe,
  title={ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback},
  author={Mou, Yutao and Xue, Zhangchi and Li, Lijun and Liu, Peiyang and Zhang, Shikun and Ye, Wei and Shao, Jing},
  journal={arXiv preprint arXiv:2601.10156},
  year={2026}
}