Jailbreak Detection Model 🚀

This model is fine-tuned to detect jailbreak prompts / unsafe instructions.

📊 Training Metrics

Training steps: 0
Final Training Loss: N/A
Final Eval Loss: 0.07551019638776779

📈 Training Curve

🛠 How to Use

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("hurtmongoose/Jailbreak-Detection-Models")
tokenizer = AutoTokenizer.from_pretrained("hurtmongoose/Jailbreak-Detection-Models")

inputs = tokenizer("This is a test jailbreak prompt", return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)
📌 Notes

Trained on jailbreak detection dataset

Can be improved with more adversarial prompts

Downloads last month: 1

Safetensors

Model size

67M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hurtmongoose/Jailbreak-Detection-Models

Quantizations

1 model