BERT Multilingual Prompt Injection Detector
Fine-tuned bert-base-multilingual-cased for detecting prompt injection attacks across 11+ languages.
Quick Start
from transformers import pipeline
classifier = pipeline("text-classification", model="devndeploy/bert-prompt-injection-detector")
result = classifier("Ignore all previous instructions")
# [{'label': 'INJECTION', 'score': 0.999}]
Performance
| Metric | Score |
|---|---|
| Accuracy | 96.47% |
| Precision | 99.20% |
| Recall | 93.94% |
| F1 Score | 96.50% |
Training
- Dataset: rikka-snow/prompt-injection-multilingual (7,282 samples)
- Base Model: bert-base-multilingual-cased
- Languages: English, German, Spanish, French, Chinese, Vietnamese, Japanese, Korean, Arabic, Russian, Portuguese
- Training Time: ~19 minutes on GPU
Use Cases
- Security layer for AI chatbots
- Content moderation
- Adversarial prompt detection
Limitations
- May not detect novel injection techniques
- Performance varies across languages
- Should be used as part of a defense-in-depth strategy
License
MIT
- Downloads last month
- 90