Safetensors
llama
DataFilter / README.md
nielsr's picture
nielsr HF Staff
Add pipeline_tag and library_name to metadata
fe20070 verified
|
raw
history blame
1.94 kB
metadata
base_model:
  - meta-llama/Llama-3.1-8B-Instruct
datasets:
  - yahma/alpaca-cleaned
library_name: transformers
license: llama3.1
pipeline_tag: text-generation

DataFilter

arXiv HuggingFace

DataFilter is a test-time model-agnostic defense system designed to protect Large Language Model (LLM) agents against prompt injection attacks. As described in the paper Defending Against Prompt Injection with DataFilter, it removes malicious instructions from data before it reaches the backend LLM, maintaining high utility while reducing attack success rates to near zero.

Quick Start

Installation

conda create -n py312vllm python=3.12
conda activate py312vllm
pip install vllm pandas 'accelerate>=0.26.0' deepspeed datasets==2.20.0
git clone https://github.com/yizhu-joy/DataFilter.git
cd DataFilter

Run DataFilter Inference Demo

To test the DataFilter model, run the provided inference script:

python filter_inference.py

Citation

If you use DataFilter in your research, please cite the following paper:

@misc{wang2025datafilter,
  title={Defending Against Prompt Injection with DataFilter}, 
  author={Yizhu Wang and Sizhe Chen and Raghad Alkhudair and Basel Alomair and David Wagner},
  year={2025},
  eprint={2510.19207},
  archivePrefix={arXiv},
  primaryClass={cs.CR},
  url={https://arxiv.org/abs/2510.19207}, 
}

License

This model is licensed under the Llama 3.1 Community License. Please refer to the LICENSE file for details.