promptgate-classifier-v2

Fine-tuned DistilBERT sequence classifier for PromptGate prompt-injection screening.

Labels:

This model is intended for use through PromptGate:

from promptgate import PromptGate

gate = PromptGate(detectors=["rule", "classifier"])
result = gate.scan("Ignore all previous instructions.")

Reference holdout results used during development:

Detector	Recall	Specificity	Precision	Accuracy
classifier v2 @ 0.5	92.5%	85.0%	86.0%	88.8%

These numbers are reference values for a fixed development holdout and are not a guarantee of production performance.

Safetensors

Model size

0.1B params

Tensor type

F32