| # Jailbreak Detection Model π | |
| This model is fine-tuned to detect jailbreak prompts / unsafe instructions. | |
| ## π Training Metrics | |
| - **Training steps:** 0 | |
| - **Final Training Loss:** N/A | |
| - **Final Eval Loss:** 0.07551019638776779 | |
| ## π Training Curve | |
|  | |
| ## π How to Use | |
| ```python | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer | |
| model = AutoModelForSequenceClassification.from_pretrained("hurtmongoose/Jailbreak-Detection-Models") | |
| tokenizer = AutoTokenizer.from_pretrained("hurtmongoose/Jailbreak-Detection-Models") | |
| inputs = tokenizer("This is a test jailbreak prompt", return_tensors="pt") | |
| outputs = model(**inputs) | |
| print(outputs.logits) | |
| π Notes | |
| Trained on jailbreak detection dataset | |
| Can be improved with more adversarial prompts | |