L5 Negative Selection — prompt-armor

Isolation Forest anomaly detection model for detecting zero-day prompt injection attacks. Learns what "normal" prompts look like and flags deviations.

Model Details

Algorithm: scikit-learn IsolationForest
Training data: 5,000 benign prompts from 5 public datasets
Features: 11 statistical text features
Inference: <1ms (tree traversal)
File size: ~1.1MB

Features Extracted

Word count
Character count
Sentence count
Average word length
Average sentence length
Imperative verb ratio
Question mark ratio
Special character density
Shannon entropy
Uppercase ratio
Unique word ratio (vocabulary diversity)

Usage

import joblib
from prompt_armor.layers.l5_negative_selection import _extract_l5_features

data = joblib.load("l5_negative_selection.pkl")
model = data["model"]

features = _extract_l5_features("your text here")
raw_score = model.decision_function(features.reshape(1, -1))[0]

# Normalize: more negative = more anomalous
score = (data["score_max"] - raw_score) / (data["score_max"] - data["score_min"])
score = max(0.0, min(1.0, score))

Part of prompt-armor

This model is used by prompt-armor — an open-source prompt injection detector. Auto-downloaded on first use.

License

Apache 2.0

Downloads last month: -