satyamsaf3ai/merged_content_moderation_and_prompt_injection_new
Viewer • Updated • 768k • 110
Mamba-2 SSM dual-head classifier for content moderation and prompt injection detection. Trained with per-class balancing (cap=30K) + WeightedRandomSampler for improved rare-class detection.
| Metric | Score |
|---|---|
| Safety Accuracy | 83.1% |
| Category Macro F1 | 0.482 |
| Category Micro F1 | 0.514 |
| Category | Test F1 |
|---|---|
| benign | 0.41 |
| child_sexual_exploitation | 0.35 |
| hate_and_harassment | 0.47 |
| indiscriminate_weapons | 0.32 |
| misinformation_and_specialized_advice | 0.38 |
| non_violent_crimes | 0.68 |
| pi_and_jailbreak | 0.60 |
| privacy | 0.55 |
| sexual_content | 0.44 |
| suicide_and_self_harm | 0.49 |
| violent_crimes | 0.61 |