YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Jailbreak Detection Model π
This model is fine-tuned to detect jailbreak prompts / unsafe instructions.
π Training Metrics
- Training steps: 0
- Final Training Loss: N/A
- Final Eval Loss: 0.07551019638776779
π Training Curve
π How to Use
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("hurtmongoose/Jailbreak-Detection-Models")
tokenizer = AutoTokenizer.from_pretrained("hurtmongoose/Jailbreak-Detection-Models")
inputs = tokenizer("This is a test jailbreak prompt", return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)
π Notes
Trained on jailbreak detection dataset
Can be improved with more adversarial prompts
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
