promptgate-classifier-v2

Fine-tuned DistilBERT sequence classifier for PromptGate prompt-injection screening.

Labels:

  • SAFE: benign input
  • ATTACK: prompt-injection-like input

This model is intended for use through PromptGate:

from promptgate import PromptGate

gate = PromptGate(detectors=["rule", "classifier"])
result = gate.scan("Ignore all previous instructions.")

Reference holdout results used during development:

Detector Recall Specificity Precision Accuracy
classifier v2 @ 0.5 92.5% 85.0% 86.0% 88.8%

These numbers are reference values for a fixed development holdout and are not a guarantee of production performance.

Downloads last month
10
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support