distilbert-prompt-guard-4class
This is a 4-class DistilBERT classifier fine-tuned for:
- benign
- prompt_injection
- jailbreak
- sensitive_access
Base model: {BASE_MODEL}
Datasets used:
- dmilush/shieldlm-prompt-injection
- antijection/prompt-injection-dataset-v1
- leolee99/NotInject
Notes:
sensitive_accessis a custom merged class created from exfiltration / prompt-extraction / excessive-agency style attacks.- Please review upstream dataset licenses before commercial use.
- Downloads last month
- 42
Model tree for KIMMISEON/distilbert-prompt-guard-4class
Base model
distilbert/distilbert-base-uncased