metadata
base_model:
- meta-llama/Llama-3.1-8B-Instruct
datasets:
- yahma/alpaca-cleaned
library_name: transformers
license: llama3.1
pipeline_tag: text-generation
DataFilter
DataFilter is a test-time model-agnostic defense system designed to protect Large Language Model (LLM) agents against prompt injection attacks. As described in the paper Defending Against Prompt Injection with DataFilter, it removes malicious instructions from data before it reaches the backend LLM, maintaining high utility while reducing attack success rates to near zero.
- Paper: Defending Against Prompt Injection with DataFilter
- Repository: GitHub - yizhu-joy/DataFilter
Quick Start
Installation
conda create -n py312vllm python=3.12
conda activate py312vllm
pip install vllm pandas 'accelerate>=0.26.0' deepspeed datasets==2.20.0
git clone https://github.com/yizhu-joy/DataFilter.git
cd DataFilter
Run DataFilter Inference Demo
To test the DataFilter model, run the provided inference script:
python filter_inference.py
Citation
If you use DataFilter in your research, please cite the following paper:
@misc{wang2025datafilter,
title={Defending Against Prompt Injection with DataFilter},
author={Yizhu Wang and Sizhe Chen and Raghad Alkhudair and Basel Alomair and David Wagner},
year={2025},
eprint={2510.19207},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2510.19207},
}
License
This model is licensed under the Llama 3.1 Community License. Please refer to the LICENSE file for details.