PangolinGuard: Fine-Tuning ModernBERT as a Lightweight Approach to AI Guardrails
dcarpintero
• • 13# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("dcarpintero/pangolin-guard-base")
model = AutoModelForSequenceClassification.from_pretrained("dcarpintero/pangolin-guard-base")LLM applications face critical security challenges in form of prompt injections and jailbreaks. This can result in models leaking sensitive data or deviating from their intended behavior. Existing safeguard models are not fully open and have limited context windows (e.g., only 512 tokens in LlamaGuard).
Pangolin Guard is a ModernBERT (Base), lightweight model that discriminates malicious prompts (i.e. prompt injection attacks).
🤗 Tech-Blog | GitHub Repo
Evaluated on unseen data from a subset of specialized benchmarks targeting prompt safety and malicious input detection, while testing over-defense behavior:
from transformers import pipeline
classifier = pipeline("text-classification", "dcarpintero/pangolin-guard-base")
text = "your input text"
output = classifier(text)
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | F1 | Accuracy |
|---|---|---|---|---|---|
| 0.1622 | 0.1042 | 100 | 0.0755 | 0.9604 | 0.9741 |
| 0.0694 | 0.2083 | 200 | 0.0525 | 0.9735 | 0.9828 |
| 0.0552 | 0.3125 | 300 | 0.0857 | 0.9696 | 0.9810 |
| 0.0535 | 0.4167 | 400 | 0.0345 | 0.9825 | 0.9889 |
| 0.0371 | 0.5208 | 500 | 0.0343 | 0.9821 | 0.9887 |
| 0.0402 | 0.625 | 600 | 0.0344 | 0.9836 | 0.9894 |
| 0.037 | 0.7292 | 700 | 0.0282 | 0.9869 | 0.9917 |
| 0.0265 | 0.8333 | 800 | 0.0229 | 0.9895 | 0.9933 |
| 0.0285 | 0.9375 | 900 | 0.0240 | 0.9885 | 0.9926 |
| 0.0191 | 1.0417 | 1000 | 0.0220 | 0.9908 | 0.9941 |
| 0.0134 | 1.1458 | 1100 | 0.0228 | 0.9911 | 0.9943 |
| 0.0124 | 1.25 | 1200 | 0.0230 | 0.9898 | 0.9935 |
| 0.0136 | 1.3542 | 1300 | 0.0212 | 0.9910 | 0.9943 |
| 0.0088 | 1.4583 | 1400 | 0.0229 | 0.9911 | 0.9943 |
| 0.0115 | 1.5625 | 1500 | 0.0211 | 0.9922 | 0.9950 |
| 0.0058 | 1.6667 | 1600 | 0.0233 | 0.9920 | 0.9949 |
| 0.0119 | 1.7708 | 1700 | 0.0199 | 0.9916 | 0.9946 |
| 0.0072 | 1.875 | 1800 | 0.0206 | 0.9925 | 0.9952 |
| 0.007 | 1.9792 | 1900 | 0.0196 | 0.9923 | 0.9950 |
Base model
answerdotai/ModernBERT-base
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="dcarpintero/pangolin-guard-base")