KIMMISEON
/

distilbert-prompt-guard-4class

Text Classification

prompt-injection-detection

jailbreak-detection

text-embeddings-inference

Model card Files Files and versions

distilbert-prompt-guard-4class

This is a 4-class DistilBERT classifier fine-tuned for:

benign
prompt_injection
jailbreak
sensitive_access

Base model: {BASE_MODEL}

Datasets used:

dmilush/shieldlm-prompt-injection
antijection/prompt-injection-dataset-v1
leolee99/NotInject

Notes:

sensitive_access is a custom merged class created from exfiltration / prompt-extraction / excessive-agency style attacks.
Please review upstream dataset licenses before commercial use.

Downloads last month: 3

Safetensors

Model size

67M params

Tensor type

F32

·

Model tree for KIMMISEON/distilbert-prompt-guard-4class

Base model

distilbert/distilbert-base-uncased

Finetuned

this model

Datasets used to train KIMMISEON/distilbert-prompt-guard-4class