Update README.md

332f8ed verified about 2 years ago

2.59 kB

license: apache-2.0
library_name: transformers
language:
  - en
tags:
  - promptinjection
  - distilbert

Model Card for DistilBERT-PromptInjectionDetectorForCVs

Model Description

This DistilBERT-based model was developed as part of a research project aiming to mitigate prompt injection attacks in applications processing CVs. It specifically targets the nuanced domain of CV submissions, demonstrating a strategy to distinguish between legitimate CVs and those containing prompt injection attempts.

Research and Application Context

The model was created in the context of demonstrating a synthetic application handling CVs, showcasing a domain-specific approach to mitigate prompt injection attacks. This work, including the model and its underlying strategy, is detailed in our research blog and the synthetic application can be accessed here.

Training Data

The model was fine-tuned on a custom dataset that combines domain-specific examples (legitimate CVs) with prompt injection examples to create a more tailored dataset. This dataset includes legitimate CVs, pure prompt injection texts, and CVs with embedded prompt injection attempts. The original datasets used are available on Hugging Face: Resume Dataset for CVs and Prompt Injections for injection examples.

Intended Use

This model is not intended for production use but serves as a demonstration of a domain-specific strategy to mitigate prompt injection attacks. It should be employed as part of a broader security strategy, including securing the model's output, as described in our article. This approach is meant to showcase how to address prompt injection risks in a targeted application scenario.

Limitations and Ethical Considerations

Prompt injection in Large Language Models (LLMs) remains an open problem with no deterministic solution. While this model offers a mitigation strategy, it's important to understand that new ways to perform injection attacks may still be possible. Users should consider this model as an example of how to approach mitigation in a specific domain, rather than a definitive solution.

License and Usage

The model and datasets are shared for research purposes, encouraging further exploration and development of mitigation strategies against prompt injection attacks. Users are encouraged to refer to the specific licenses of the datasets and the model for more details on permissible use cases.