Model Card for Model ID

A small model to detect saturation jailbreak attacks. Not intended for standalone use against other kinds of jailbreaks.

Model Details

Model Description

Developed by: Guardrails AI, Joseph Catrambone
Funded by [optional]: Guardrails AI
Model type: Transformer, BERT
Language(s) (NLP): English
License: Restrictive
Finetuned from model [optional]: bert-tiny

Model Sources [optional]

Repository: https://www.github.com/guardrails-ai/detect-jailbreak

Uses

Designed as a small prefilter for a subset of saturation attacks.

Out-of-Scope Use

Not designed to catch other types of jailbreaks. Saturation protection is one part of a more complite suite of defenses against improper use of ML systems.

Downloads last month: 61,920

Safetensors

Model size

4.39M params

Tensor type

F32

Model tree for GuardrailsAI/prompt-saturation-attack-detector

Base model

google-bert/bert-base-uncased

Finetuned

(6689)

this model