File size: 812 Bytes
19767d2 c0277e0 19767d2 c0277e0 19767d2 c0277e0 19767d2 c0277e0 19767d2 c0277e0 19767d2 c0277e0 19767d2 c0277e0 19767d2 c0277e0 19767d2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# Jailbreak Detection Model π
This model is fine-tuned to detect jailbreak prompts / unsafe instructions.
## π Training Metrics
- **Training steps:** 0
- **Final Training Loss:** N/A
- **Final Eval Loss:** 0.07551019638776779
## π Training Curve

## π How to Use
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("hurtmongoose/Jailbreak-Detection-Models")
tokenizer = AutoTokenizer.from_pretrained("hurtmongoose/Jailbreak-Detection-Models")
inputs = tokenizer("This is a test jailbreak prompt", return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)
π Notes
Trained on jailbreak detection dataset
Can be improved with more adversarial prompts
|