L5 Negative Selection โ€” prompt-armor

Isolation Forest anomaly detection model for detecting zero-day prompt injection attacks. Learns what "normal" prompts look like and flags deviations.

Model Details

  • Algorithm: scikit-learn IsolationForest
  • Training data: 5,000 benign prompts from 5 public datasets
  • Features: 11 statistical text features
  • Inference: <1ms (tree traversal)
  • File size: ~1.1MB

Features Extracted

  1. Word count
  2. Character count
  3. Sentence count
  4. Average word length
  5. Average sentence length
  6. Imperative verb ratio
  7. Question mark ratio
  8. Special character density
  9. Shannon entropy
  10. Uppercase ratio
  11. Unique word ratio (vocabulary diversity)

Usage

import joblib
from prompt_armor.layers.l5_negative_selection import _extract_l5_features

data = joblib.load("l5_negative_selection.pkl")
model = data["model"]

features = _extract_l5_features("your text here")
raw_score = model.decision_function(features.reshape(1, -1))[0]

# Normalize: more negative = more anomalous
score = (data["score_max"] - raw_score) / (data["score_max"] - data["score_min"])
score = max(0.0, min(1.0, score))

Part of prompt-armor

This model is used by prompt-armor โ€” an open-source prompt injection detector. Auto-downloaded on first use.

License

Apache 2.0

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support