YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Jailbreak Detection Model πŸš€

This model is fine-tuned to detect jailbreak prompts / unsafe instructions.

πŸ“Š Training Metrics

  • Training steps: 0
  • Final Training Loss: N/A
  • Final Eval Loss: 0.07551019638776779

πŸ“ˆ Training Curve

Training Curve

πŸ›  How to Use

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("hurtmongoose/Jailbreak-Detection-Models")
tokenizer = AutoTokenizer.from_pretrained("hurtmongoose/Jailbreak-Detection-Models")

inputs = tokenizer("This is a test jailbreak prompt", return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)
πŸ“Œ Notes

Trained on jailbreak detection dataset

Can be improved with more adversarial prompts
Downloads last month
1
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for hurtmongoose/Jailbreak-Detection-Models

Quantizations
1 model