L5 Negative Selection โ prompt-armor
Isolation Forest anomaly detection model for detecting zero-day prompt injection attacks. Learns what "normal" prompts look like and flags deviations.
Model Details
- Algorithm: scikit-learn IsolationForest
- Training data: 5,000 benign prompts from 5 public datasets
- Features: 11 statistical text features
- Inference: <1ms (tree traversal)
- File size: ~1.1MB
Features Extracted
- Word count
- Character count
- Sentence count
- Average word length
- Average sentence length
- Imperative verb ratio
- Question mark ratio
- Special character density
- Shannon entropy
- Uppercase ratio
- Unique word ratio (vocabulary diversity)
Usage
import joblib
from prompt_armor.layers.l5_negative_selection import _extract_l5_features
data = joblib.load("l5_negative_selection.pkl")
model = data["model"]
features = _extract_l5_features("your text here")
raw_score = model.decision_function(features.reshape(1, -1))[0]
# Normalize: more negative = more anomalous
score = (data["score_max"] - raw_score) / (data["score_max"] - data["score_min"])
score = max(0.0, min(1.0, score))
Part of prompt-armor
This model is used by prompt-armor โ an open-source prompt injection detector. Auto-downloaded on first use.
License
Apache 2.0
- Downloads last month
- -