File size: 3,026 Bytes

---
tags:
- text-classification
- security
- red-team
- roberta
license: odc-by
datasets:
- trendmicro-ailab/Primus-FineWeb
metrics:
- precision
- recall
- f1
pipeline_tag: text-classification
library_name: transformers
---

# RedSecureBERT 🔴🛡️

Detects **technical red-team / offensive security** text (English).  

| Split | Precision | Recall | F1 | Threshold |
|-------|-----------|--------|----|-----------|
| Validation | **0.963** | **0.991** | **0.977** | **0.515** |

> **Recommended cut-off:** `prob >= 0.515` (chosen via F₂ on the validation split).  

---

## Intended uses & limits

* **Triaging** large corpora for technical content.  
* **Input language:** English.  
* **No external test set** yet → treat scores as optimistic.

## Demo

| # | Phrase | Blue Score | Blue Label | Red Score | Red Label |
|---|--------|-----------|-----------|----------|----------|
| 1 | To exfiltrate sensitive data, launch a phishing campaign that tricks employees into revealing their VPN credentials. | 0.066 | Not Offensive | 0.824 | Offensive (red-team) |
| 2 | We should deploy an EDR solution, monitor all endpoints for intrusion attempts, and enforce strict password policies. | 0.557 | Offensive (red-team) | 0.019 | Not Offensive |
| 3 | “Our marketing team will unveil the new cybersecurity branding materials at next Tuesday’s antivirus product launch | 0.256 | Not Offensive | 0.021 | Not Offensive |
| 4 | I'm excited about the company picnic. There's no cybersecurity topic—just burgers and games. | 0.272 | Not Offensive | 0.103 | Not Offensive |

---

## Training data

| Label | Rows |
|-------|------|
| Offensive | 30 746 |
| Defensive | 19 550 |
| Other | 130 000 |
| **Total** | **180 296** |

---

## Model details

| Field | Value |
|-------|-------|
| Base encoder | `ehsanaghaei/SecureBERT` (RoBERTa-base, 125 M) |
| Objective | One-vs-rest, focal-loss (γ = 2) |
| Epochs | 3 &nbsp;·&nbsp; micro-batch 16 &nbsp;·&nbsp; LR 2e-5 |
| Hardware | 1× RTX 4090 (≈ 41 min) |
| Inference dtype | FP16-safe |

---

## Training Data License

- **Source**: [trendmicro-ailab/Primus-FineWeb](https://huggingface.co/datasets/trendmicro-ailab/Primus-FineWeb)  
- **License**: ODC-By-1.0 (http://opendatacommons.org/licenses/by/1-0/)  
- **Requirements**:  
  - Preserve all original copyright/license notices  
  - Honor [Common Crawl ToU](https://commoncrawl.org/terms-of-use/)  

---

## Quick start

```python
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer

model_id = "HagalazAI/RedSecureBERT"
tok   = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

clf = pipeline("text-classification", model=model, tokenizer=tok, top_k=None)

text = "Generate a ROP chain to bypass DEP on Windows 10."
prob = clf(text)[0]["score"]      # sigmoid prob for class 0 (Offensive)
print(f"P(offensive) = {prob:.3f}")

is_red = prob >= 0.515            # ← recommended threshold
print("is_red:", is_red)