--- base_model: - meta-llama/Llama-3.1-8B-Instruct datasets: - yahma/alpaca-cleaned library_name: transformers license: llama3.1 pipeline_tag: text-generation --- # DataFilter [![arXiv](https://img.shields.io/badge/arXiv-2510.19207-b31b1b.svg)](https://arxiv.org/abs/2510.19207) [![HuggingFace](https://img.shields.io/badge/🤗-Model-yellow)](https://huggingface.co/JoyYizhu/DataFilter) DataFilter is a test-time model-agnostic defense system designed to protect Large Language Model (LLM) agents against prompt injection attacks. As described in the paper [Defending Against Prompt Injection with DataFilter](https://huggingface.co/papers/2510.19207), it removes malicious instructions from data before it reaches the backend LLM, maintaining high utility while reducing attack success rates to near zero. - **Paper:** [Defending Against Prompt Injection with DataFilter](https://huggingface.co/papers/2510.19207) - **Repository:** [GitHub - yizhu-joy/DataFilter](https://github.com/yizhu-joy/DataFilter) ## Quick Start ### Installation ```bash conda create -n py312vllm python=3.12 conda activate py312vllm pip install vllm pandas 'accelerate>=0.26.0' deepspeed datasets==2.20.0 git clone https://github.com/yizhu-joy/DataFilter.git cd DataFilter ``` ### Run DataFilter Inference Demo To test the DataFilter model, run the provided inference script: ```bash python filter_inference.py ``` ## Citation If you use DataFilter in your research, please cite the following paper: ```bibtex @misc{wang2025datafilter, title={Defending Against Prompt Injection with DataFilter}, author={Yizhu Wang and Sizhe Chen and Raghad Alkhudair and Basel Alomair and David Wagner}, year={2025}, eprint={2510.19207}, archivePrefix={arXiv}, primaryClass={cs.CR}, url={https://arxiv.org/abs/2510.19207}, } ``` ## License This model is licensed under the Llama 3.1 Community License. Please refer to the LICENSE file for details.