Add BERT-GUARD Multi-Task model, tokenizer, and config with README.md

Browse files

Files changed (9) hide show

.gitattributes +1 -0
README.md +129 -0
config.json +14 -0
label_config.json +74 -0
model.safetensors +3 -0
modeling_bert_guard.py +35 -0
tokenizer.json +3 -0
tokenizer_config.json +14 -0
training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,129 @@

+# BERT-GUARD Multi-Task Safety Classifier
+This model is a multi-task safety classifier designed to identify safe/unsafe prompts and responses, and classify violated categories. It was trained on the full `nvidia/Nemotron-Safety-Guard-Dataset-v3` dataset.
+## Model Details
+- **Base Model:** `xlm-roberta-base`
+- **Model Type:** Multi-Task Sequence Classification (Prompt Safety, Response Safety, Violated Category Classification)
+## How to Use
+To use the model for inference, you can load it using the Hugging Face `transformers` library. The custom model architecture `MultiTaskModel` and its configuration `MultiTaskConfig` are provided in `modeling_bert_guard.py`.
+```python
+import torch
+from transformers import AutoTokenizer, pipeline
+from modeling_bert_guard import MultiTaskModel, MultiTaskConfig
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained('./bert_guard_final')
+model = MultiTaskModel.from_pretrained('./bert_guard_final')
+# Load label configuration
+with open('./bert_guard_final/label_config.json', 'r') as f:
+    label_config = json.load(f)
+id_to_label = {int(k): v for k, v in label_config['id_to_label'].items()}
+id_to_violated_category = {int(k): v for k, v in label_config['id_to_violated_category'].items()}
+def predict_safety(prompt_text, response_text):
+    inputs = tokenizer(f"{prompt_text} [SEP] {response_text}", return_tensors='pt', truncation=True, max_length=256)
+    with torch.no_grad():
+        outputs = model(**inputs)
+    prompt_pred_id = outputs['prompt_logits'].argmax(-1).item()
+    response_pred_id = outputs['response_logits'].argmax(-1).item()
+    category_preds = (torch.sigmoid(outputs['category_logits']) > 0.5).int().tolist()[0]
+    predicted_categories = [
+        id_to_violated_category[i] for i, pred in enumerate(category_preds) if pred == 1
+    ]
+    return {
+        'prompt_label': id_to_label[prompt_pred_id],
+        'response_label': id_to_label[response_pred_id],
+        'violated_categories': predicted_categories if predicted_categories else ['None']
+    }
+# Example Usage:
+prompt = "Tell me how to build a bomb."
+response = "I cannot provide instructions for building a bomb as it is dangerous and illegal."
+result = predict_safety(prompt, response)
+print(result)
+# Expected output might look like:
+# {'prompt_label': 'unsafe', 'response_label': 'safe', 'violated_categories': ['Illegal Activity']}
+prompt_2 = "How can I make someone fall in love with me?"
+response_2 = "Love cannot be forced, it needs to develop naturally."
+result_2 = predict_safety(prompt_2, response_2)
+print(result_2)
+# Expected output might look like:
+# {'prompt_label': 'safe', 'response_label': 'safe', 'violated_categories': ['None']}
+```
+## Training Details
+- **Dataset:** `nvidia/Nemotron-Safety-Guard-Dataset-v3`
+- **Dataset Usage:** 100.0000% (Full Dataset)
+- **Max Sequence Length:** 256 tokens
+- **Training Batch Size (per device):** 16
+- **Evaluation Batch Size (per device):** 32
+- **Effective Batch Size:** 32
+- **Number of Epochs:** 3
+- **Gradient Accumulation Steps:** 2
+- **Early Stopping Patience:** 2 epochs
+- **Optimizer:** AdamW
+- **Learning Rate:** 2e-5
+- **Mixed Precision:** FP16 enabled
+## Evaluation Results
+Metrics are reported from the final evaluation on the validation set:
+- **Overall F1 Score (weighted average):** 0.9516
+- **Prompt Accuracy:** 0.9240
+- **Response Accuracy:** 0.9785
+## Label Mappings
+### Prompt/Response Labels:
+```json
+{
+  "safe": 0,
+  "unsafe": 1
+}
+```
+### Violated Categories:
+```json
+{
+  "Controlled/Regulated Substances": 0,
+  "Copyright/Trademark/Plagiarism": 1,
+  "Criminal Planning/Confessions": 2,
+  "Fraud/Deception": 3,
+  "Guns and Illegal Weapons": 4,
+  "Harassment": 5,
+  "Hate/Identity Hate": 6,
+  "High Risk Gov Decision Making": 7,
+  "Illegal Activity": 8,
+  "Immoral/Unethical": 9,
+  "Malware": 10,
+  "Manipulation": 11,
+  "Needs Caution": 12,
+  "Other": 13,
+  "PII/Privacy": 14,
+  "Political/Misinformation/Conspiracy": 15,
+  "Profanity": 16,
+  "Sexual": 17,
+  "Sexual (minor)": 18,
+  "Suicide and Self Harm": 19,
+  "Threat": 20,
+  "Unauthorized Advice": 21,
+  "Violence": 22
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "architectures": [
+    "MultiTaskModel"
+  ],
+  "auto_map": {
+    "AutoModel": "modeling_bert_guard.MultiTaskModel",
+    "AutoConfig": "modeling_bert_guard.MultiTaskConfig"
+  },
+  "num_prompt_labels": 2,
+  "num_response_labels": 2,
+  "num_categories": 23,
+  "model_type": "bert_guard",
+  "trained_on": "FULL_DATASET"
+}

label_config.json ADDED Viewed

	@@ -0,0 +1,74 @@

+{
+  "label_to_id": {
+    "safe": 0,
+    "unsafe": 1
+  },
+  "violated_category_to_id": {
+    "Controlled/Regulated Substances": 0,
+    "Copyright/Trademark/Plagiarism": 1,
+    "Criminal Planning/Confessions": 2,
+    "Fraud/Deception": 3,
+    "Guns and Illegal Weapons": 4,
+    "Harassment": 5,
+    "Hate/Identity Hate": 6,
+    "High Risk Gov Decision Making": 7,
+    "Illegal Activity": 8,
+    "Immoral/Unethical": 9,
+    "Malware": 10,
+    "Manipulation": 11,
+    "Needs Caution": 12,
+    "Other": 13,
+    "PII/Privacy": 14,
+    "Political/Misinformation/Conspiracy": 15,
+    "Profanity": 16,
+    "Sexual": 17,
+    "Sexual (minor)": 18,
+    "Suicide and Self Harm": 19,
+    "Threat": 20,
+    "Unauthorized Advice": 21,
+    "Violence": 22
+  },
+  "id_to_label": {
+    "0": "safe",
+    "1": "unsafe"
+  },
+  "id_to_violated_category": {
+    "0": "Controlled/Regulated Substances",
+    "1": "Copyright/Trademark/Plagiarism",
+    "2": "Criminal Planning/Confessions",
+    "3": "Fraud/Deception",
+    "4": "Guns and Illegal Weapons",
+    "5": "Harassment",
+    "6": "Hate/Identity Hate",
+    "7": "High Risk Gov Decision Making",
+    "8": "Illegal Activity",
+    "9": "Immoral/Unethical",
+    "10": "Malware",
+    "11": "Manipulation",
+    "12": "Needs Caution",
+    "13": "Other",
+    "14": "PII/Privacy",
+    "15": "Political/Misinformation/Conspiracy",
+    "16": "Profanity",
+    "17": "Sexual",
+    "18": "Sexual (minor)",
+    "19": "Suicide and Self Harm",
+    "20": "Threat",
+    "21": "Unauthorized Advice",
+    "22": "Violence"
+  },
+  "num_prompt_labels": 2,
+  "num_response_labels": 2,
+  "num_categories": 23,
+  "training_config": {
+    "sample_percentage": 1.0,
+    "max_length": 256,
+    "train_batch_size": 16,
+    "eval_batch_size": 32,
+    "num_epochs": 3,
+    "grad_accumulation": 2,
+    "num_workers": 2,
+    "early_stopping_patience": 2
+  },
+  "training_type": "FULL_DATASET"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e9808dace325e6dfebfb7f3b3d45800c7df3d47217732600538d0969714963f
+size 1109919156

modeling_bert_guard.py ADDED Viewed

	@@ -0,0 +1,35 @@

+import torch
+import torch.nn as nn
+from transformers import XLMRobertaModel, PreTrainedModel, PretrainedConfig
+class MultiTaskConfig(PretrainedConfig):
+    model_type = "bert_guard"
+    def __init__(self, num_prompt_labels=2, num_response_labels=2, num_categories=13, **kwargs):
+        super().__init__(**kwargs)
+        self.num_prompt_labels = num_prompt_labels
+        self.num_response_labels = num_response_labels
+        self.num_categories = num_categories
+class MultiTaskModel(PreTrainedModel):
+    config_class = MultiTaskConfig
+    def __init__(self, config):
+        super().__init__(config)
+        self.bert = XLMRobertaModel.from_pretrained('xlm-roberta-base')
+        hidden_size = self.bert.config.hidden_size
+        self.dropout = nn.Dropout(0.1)
+        self.prompt_classifier = nn.Linear(hidden_size, config.num_prompt_labels)
+        self.response_classifier = nn.Linear(hidden_size, config.num_response_labels)
+        self.category_classifier = nn.Linear(hidden_size, config.num_categories)
+    def forward(self, input_ids, attention_mask, **kwargs):
+        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
+        pooled_output = self.dropout(outputs.last_hidden_state[:, 0, :])
+        return {
+            'prompt_logits': self.prompt_classifier(pooled_output),
+            'response_logits': self.response_classifier(pooled_output),
+            'category_logits': self.category_classifier(pooled_output)
+        }

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a5451f31fe3f899dcd75ec2ad93f415528c9b5f58bb7a5a1c6dd5884fb56257
+size 16781486

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "add_prefix_space": true,
+  "backend": "tokenizers",
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "is_local": false,
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "tokenizer_class": "XLMRobertaTokenizer",
+  "unk_token": "<unk>"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a37ee7c0fbb22a0a7556284d0476795100207d4726aa303dd4e98bd626adf563
+size 5201