How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="GuardrailsAI/prompt-saturation-attack-detector")
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("GuardrailsAI/prompt-saturation-attack-detector")
model = AutoModelForSequenceClassification.from_pretrained("GuardrailsAI/prompt-saturation-attack-detector")
Quick Links

Model Card for Model ID

A small model to detect saturation jailbreak attacks. Not intended for standalone use against other kinds of jailbreaks.

Model Details

Model Description

  • Developed by: Guardrails AI, Joseph Catrambone
  • Funded by [optional]: Guardrails AI
  • Model type: Transformer, BERT
  • Language(s) (NLP): English
  • License: Restrictive
  • Finetuned from model [optional]: bert-tiny

Model Sources [optional]

Uses

Designed as a small prefilter for a subset of saturation attacks.

Out-of-Scope Use

Not designed to catch other types of jailbreaks. Saturation protection is one part of a more complite suite of defenses against improper use of ML systems.

Downloads last month
61,920
Safetensors
Model size
4.39M params
Tensor type
F32
·
Inference Providers NEW

Model tree for GuardrailsAI/prompt-saturation-attack-detector

Finetuned
(6689)
this model