|
|
--- |
|
|
base_model: |
|
|
- meta-llama/Llama-3.1-8B-Instruct |
|
|
datasets: |
|
|
- yahma/alpaca-cleaned |
|
|
library_name: transformers |
|
|
license: llama3.1 |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# DataFilter |
|
|
|
|
|
[](https://arxiv.org/abs/2510.19207) |
|
|
[](https://huggingface.co/JoyYizhu/DataFilter) |
|
|
|
|
|
DataFilter is a test-time model-agnostic defense system designed to protect Large Language Model (LLM) agents against prompt injection attacks. As described in the paper [Defending Against Prompt Injection with DataFilter](https://huggingface.co/papers/2510.19207), it removes malicious instructions from data before it reaches the backend LLM, maintaining high utility while reducing attack success rates to near zero. |
|
|
|
|
|
- **Paper:** [Defending Against Prompt Injection with DataFilter](https://huggingface.co/papers/2510.19207) |
|
|
- **Repository:** [GitHub - yizhu-joy/DataFilter](https://github.com/yizhu-joy/DataFilter) |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
conda create -n py312vllm python=3.12 |
|
|
conda activate py312vllm |
|
|
pip install vllm pandas 'accelerate>=0.26.0' deepspeed datasets==2.20.0 |
|
|
git clone https://github.com/yizhu-joy/DataFilter.git |
|
|
cd DataFilter |
|
|
``` |
|
|
|
|
|
### Run DataFilter Inference Demo |
|
|
To test the DataFilter model, run the provided inference script: |
|
|
```bash |
|
|
python filter_inference.py |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use DataFilter in your research, please cite the following paper: |
|
|
|
|
|
```bibtex |
|
|
@misc{wang2025datafilter, |
|
|
title={Defending Against Prompt Injection with DataFilter}, |
|
|
author={Yizhu Wang and Sizhe Chen and Raghad Alkhudair and Basel Alomair and David Wagner}, |
|
|
year={2025}, |
|
|
eprint={2510.19207}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CR}, |
|
|
url={https://arxiv.org/abs/2510.19207}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
This model is licensed under the Llama 3.1 Community License. Please refer to the LICENSE file for details. |