# Jailbreak Detection Model 🚀 This model is fine-tuned to detect jailbreak prompts / unsafe instructions. ## 📊 Training Metrics - **Training steps:** 0 - **Final Training Loss:** N/A - **Final Eval Loss:** 0.07551019638776779 ## 📈 Training Curve ![Training Curve](./training_loss.png) ## 🛠 How to Use ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer model = AutoModelForSequenceClassification.from_pretrained("hurtmongoose/Jailbreak-Detection-Models") tokenizer = AutoTokenizer.from_pretrained("hurtmongoose/Jailbreak-Detection-Models") inputs = tokenizer("This is a test jailbreak prompt", return_tensors="pt") outputs = model(**inputs) print(outputs.logits) 📌 Notes Trained on jailbreak detection dataset Can be improved with more adversarial prompts