DataFilter / README.md

nielsr HF Staff

Add pipeline_tag and library_name to metadata

fe20070 verified 17 days ago

preview code

raw

history blame

1.94 kB

metadata

base_model:
  - meta-llama/Llama-3.1-8B-Instruct
datasets:
  - yahma/alpaca-cleaned
library_name: transformers
license: llama3.1
pipeline_tag: text-generation

DataFilter

DataFilter is a test-time model-agnostic defense system designed to protect Large Language Model (LLM) agents against prompt injection attacks. As described in the paper Defending Against Prompt Injection with DataFilter, it removes malicious instructions from data before it reaches the backend LLM, maintaining high utility while reducing attack success rates to near zero.

Paper: Defending Against Prompt Injection with DataFilter
Repository: GitHub - yizhu-joy/DataFilter

Quick Start

Installation

conda create -n py312vllm python=3.12
conda activate py312vllm
pip install vllm pandas 'accelerate>=0.26.0' deepspeed datasets==2.20.0
git clone https://github.com/yizhu-joy/DataFilter.git
cd DataFilter

Run DataFilter Inference Demo

To test the DataFilter model, run the provided inference script:

python filter_inference.py

Citation

If you use DataFilter in your research, please cite the following paper:

@misc{wang2025datafilter,
  title={Defending Against Prompt Injection with DataFilter}, 
  author={Yizhu Wang and Sizhe Chen and Raghad Alkhudair and Basel Alomair and David Wagner},
  year={2025},
  eprint={2510.19207},
  archivePrefix={arXiv},
  primaryClass={cs.CR},
  url={https://arxiv.org/abs/2510.19207}, 
}

License

This model is licensed under the Llama 3.1 Community License. Please refer to the LICENSE file for details.