MambaShield v2 (Balanced)

Mamba-2 SSM dual-head classifier for content moderation and prompt injection detection. Trained with per-class balancing (cap=30K) + WeightedRandomSampler for improved rare-class detection.

Performance

Metric Score
Safety Accuracy 83.1%
Category Macro F1 0.482
Category Micro F1 0.514

Architecture

  • Backbone: Mamba-2 SSD × 4 layers, d_model=256, n_heads=8
  • Parameters: 9.6M
  • Tokenizer: bert-base-uncased
  • Heads: Safety binary (sigmoid) + 11-class multi-label (sigmoid)

Categories (11)

Category Test F1
benign 0.41
child_sexual_exploitation 0.35
hate_and_harassment 0.47
indiscriminate_weapons 0.32
misinformation_and_specialized_advice 0.38
non_violent_crimes 0.68
pi_and_jailbreak 0.60
privacy 0.55
sexual_content 0.44
suicide_and_self_harm 0.49
violent_crimes 0.61

Training

  • Dataset: 285K samples after 40:60 resampling + per-class cap (30K max per class)
  • WeightedRandomSampler for rare class oversampling
  • BF16 mixed precision, AdamW + cosine LR, 5 epochs
  • NVIDIA L4 24GB
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train jainsatyam26/mamba-shield-v2

Space using jainsatyam26/mamba-shield-v2 1