Upload guard-safety-classifier model

Browse files

Files changed (9) hide show

README.md +195 -0
added_tokens.json +3 -0
config.json +83 -0
label_encoders.pkl +3 -0
model_weights.pt +3 -0
special_tokens_map.json +15 -0
spm.model +3 -0
tokenizer.json +0 -0
tokenizer_config.json +58 -0

README.md ADDED Viewed

	@@ -0,0 +1,195 @@

+---
+language: en
+license: apache-2.0
+tags:
+- safety-classifier
+- content-moderation
+- multi-task
+- deberta-v3
+- text-classification
+datasets:
+- budecosystem/guardrail-training-data
+metrics:
+- accuracy
+- f1
+---
+# 🛡️ Guard Safety Classifier
+A multi-task safety classifier based on **DeBERTa-v3-small** trained on 3.9M+ samples for content moderation and safety detection.
+## 🎯 Model Tasks
+This model performs **three simultaneous predictions**:
+1. **Binary Safety Classification** (`is_safe`)
+   - ✅ Safe content
+   - ⚠️ Unsafe content
+2. **Single-Label Category Classification** (`category`)
+   - Identifies the primary safety concern category
+3. **Multi-Label Categories** (`categories`)
+   - Can detect multiple safety issues simultaneously
+## 📊 Performance Metrics
+| Metric | Score |
+|--------|-------|
+| **is_safe Accuracy** | 92.76% |
+| **category F1** | 0.5037 |
+| **categories F1** | 0.9068 |
+| **Test Loss** | 1.0233 |
+## 🚀 Quick Start
+```python
+import torch
+from transformers import AutoTokenizer
+import pickle
+# Load model and tokenizer
+model_name = "YOUR_USERNAME/guard-safety-classifier"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Load model architecture
+from your_model_file import MultiTaskSafetyClassifier
+model = MultiTaskSafetyClassifier(
+    model_name="microsoft/deberta-v3-small",
+    num_categories=NUM_CATEGORIES,
+    num_multi_labels=NUM_MULTI_LABELS
+)
+# Load weights
+model.load_state_dict(torch.load("model_weights.pt"))
+model.eval()
+# Load label encoders
+with open("label_encoders.pkl", "rb") as f:
+    encoders = pickle.load(f)
+    le_category = encoders['le_category']
+    mlb = encoders['mlb']
+# Inference
+text = "Your text here"
+inputs = tokenizer(text, return_tensors="pt", max_length=128,
+                   truncation=True, padding=True)
+with torch.no_grad():
+    outputs = model(**inputs)
+is_safe = torch.softmax(outputs['is_safe'], dim=1)[0][1].item() > 0.5
+category = le_category.inverse_transform([outputs['category'].argmax(1).item()])[0]
+categories = mlb.inverse_transform((torch.sigmoid(outputs['categories']) > 0.5).cpu().numpy())[0]
+print(f"Is Safe: {is_safe}")
+print(f"Category: {category}")
+print(f"Categories: {list(categories)}")
+```
+## 🏗️ Model Architecture
+- **Base Model**: `microsoft/deberta-v3-small` (141M parameters)
+- **Hidden Size**: 768
+- **Max Sequence Length**: 128 tokens
+- **Training Framework**: PyTorch + Transformers
+## 📚 Training Details
+- **Dataset**: [budecosystem/guardrail-training-data](https://huggingface.co/datasets/budecosystem/guardrail-training-data)
+- **Training Samples**: 3,182,844
+- **Validation Samples**: 397,855
+- **Test Samples**: 397,856
+- **Batch Size**: 64
+- **Learning Rate**: 2e-5
+- **Epochs**: 1
+- **Optimizer**: AdamW with linear warmup
+- **Hardware**: NVIDIA Tesla T4 (16GB)
+- **Training Time**: ~8 hours
+## 🏷️ Categories
+The model can identify the following safety categories:
+```python
+[
+  "animal_abuse",
+  "benign",
+  "child_abuse",
+  "code_vulnerabilities",
+  "controversial_topics_politics",
+  "cwe_compliance",
+  "dangerous_expert_advice",
+  "discrimination_stereotype_injustice",
+  "drug_abuse_weapons_banned_substance",
+  "financial_crime_property_crime_theft",
+  "fraud_deception_misinformation",
+  "gender_bias",
+  "hate_speech_offensive_language",
+  "jailbreak_prompt_injection",
+  "malware_hacking_cyberattack",
+  "misinformation_regarding_ethics_laws_and_safety",
+  "mitre_compliance",
+  "non_violent_unethical_behavior",
+  "orientation_bias",
+  "privacy_violation",
+  "race_bias",
+  "religious_bias",
+  "self_harm",
+  "sexually_explicit_adult_content",
+  "terrorism_organized_crime",
+  "violence_aiding_and_abetting_incitement"
+]
+```
+## 🔢 Multi-Label Classes
+```python
+[
+  " ",
+  ",",
+  "_",
+  "a",
+  "b",
+  "c",
+  "d",
+  "e",
+  "f",
+  "g",
+  "h",
+  "i",
+  "j",
+  "k",
+  "l",
+  "m",
+  "n",
+  "o",
+  "p",
+  "r",
+  "s",
+  "t",
+  "u",
+  "v",
+  "w",
+  "x",
+  "y",
+  "z"
+]
+```
+## ⚙️ Configuration
+Full model configuration is available in `config.json`
+## 📄 License
+Apache 2.0
+## 🙏 Acknowledgments
+- Base model: [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small)
+- Training data: [budecosystem/guardrail-training-data](https://huggingface.co/datasets/budecosystem/guardrail-training-data)
+## 📮 Contact
+For questions or issues, please open an issue on the model repository.

added_tokens.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "[MASK]": 128000
+}

config.json ADDED Viewed

	@@ -0,0 +1,83 @@

+{
+  "model_name": "microsoft/deberta-v3-small",
+  "max_len": 128,
+  "batch_size": 64,
+  "epochs": 1,
+  "lr": 2e-05,
+  "weight_decay": 0.01,
+  "warmup_steps": 500,
+  "grad_clip": 1.0,
+  "seed": 42,
+  "w_is_safe": 1.0,
+  "w_category": 1.0,
+  "w_categories": 0.5,
+  "save_steps": 200,
+  "eval_steps": 500,
+  "num_categories": 26,
+  "num_multi_labels": 28,
+  "category_classes": [
+    "animal_abuse",
+    "benign",
+    "child_abuse",
+    "code_vulnerabilities",
+    "controversial_topics_politics",
+    "cwe_compliance",
+    "dangerous_expert_advice",
+    "discrimination_stereotype_injustice",
+    "drug_abuse_weapons_banned_substance",
+    "financial_crime_property_crime_theft",
+    "fraud_deception_misinformation",
+    "gender_bias",
+    "hate_speech_offensive_language",
+    "jailbreak_prompt_injection",
+    "malware_hacking_cyberattack",
+    "misinformation_regarding_ethics_laws_and_safety",
+    "mitre_compliance",
+    "non_violent_unethical_behavior",
+    "orientation_bias",
+    "privacy_violation",
+    "race_bias",
+    "religious_bias",
+    "self_harm",
+    "sexually_explicit_adult_content",
+    "terrorism_organized_crime",
+    "violence_aiding_and_abetting_incitement"
+  ],
+  "multi_label_classes": [
+    " ",
+    ",",
+    "_",
+    "a",
+    "b",
+    "c",
+    "d",
+    "e",
+    "f",
+    "g",
+    "h",
+    "i",
+    "j",
+    "k",
+    "l",
+    "m",
+    "n",
+    "o",
+    "p",
+    "r",
+    "s",
+    "t",
+    "u",
+    "v",
+    "w",
+    "x",
+    "y",
+    "z"
+  ],
+  "best_val_loss": 1.0249000663187966,
+  "test_metrics": {
+    "loss": 1.0232949212993905,
+    "is_safe_acc": 0.9276446754604681,
+    "category_f1": 0.5036962280648937,
+    "categories_f1": 0.9067776039136755
+  }
+}

label_encoders.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3ebba49ff1eca26a2905f9dc7e4af61c6a68ed079e0c3c3917e8c87db8dba609
+size 5415

model_weights.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4664c6f76d143d0cdbab46aca62014a06fd0d299b911f85c598d65ef8e6d0ccc
+size 567685355

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "[CLS]",
+  "cls_token": "[CLS]",
+  "eos_token": "[SEP]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

spm.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
+size 2464616

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128000": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "[CLS]",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": false,
+  "eos_token": "[SEP]",
+  "mask_token": "[MASK]",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "sp_model_kwargs": {},
+  "split_by_punct": false,
+  "tokenizer_class": "DebertaV2Tokenizer",
+  "unk_token": "[UNK]",
+  "vocab_type": "spm"
+}