metadata
tags:
- text-classification
- security
- red-team
- roberta
license: odc-by
datasets:
- trendmicro-ailab/Primus-FineWeb
metrics:
- precision
- recall
- f1
pipeline_tag: text-classification
library_name: transformers
RedSecureBERT 🔴🛡️
Detects technical red-team / offensive security text (English).
| Split | Precision | Recall | F1 | Threshold |
|---|---|---|---|---|
| Validation | 0.963 | 0.991 | 0.977 | 0.515 |
Recommended cut-off:
prob >= 0.515(chosen via F₂ on the validation split).
Intended uses & limits
- Triaging large corpora for technical content.
- Input language: English.
- No external test set yet → treat scores as optimistic.
Demo
| # | Phrase | Blue Score | Blue Label | Red Score | Red Label |
|---|---|---|---|---|---|
| 1 | To exfiltrate sensitive data, launch a phishing campaign that tricks employees into revealing their VPN credentials. | 0.066 | Not Offensive | 0.824 | Offensive (red-team) |
| 2 | We should deploy an EDR solution, monitor all endpoints for intrusion attempts, and enforce strict password policies. | 0.557 | Offensive (red-team) | 0.019 | Not Offensive |
| 3 | “Our marketing team will unveil the new cybersecurity branding materials at next Tuesday’s antivirus product launch | 0.256 | Not Offensive | 0.021 | Not Offensive |
| 4 | I'm excited about the company picnic. There's no cybersecurity topic—just burgers and games. | 0.272 | Not Offensive | 0.103 | Not Offensive |
Training data
| Label | Rows |
|---|---|
| Offensive | 30 746 |
| Defensive | 19 550 |
| Other | 130 000 |
| Total | 180 296 |
Model details
| Field | Value |
|---|---|
| Base encoder | ehsanaghaei/SecureBERT (RoBERTa-base, 125 M) |
| Objective | One-vs-rest, focal-loss (γ = 2) |
| Epochs | 3 · micro-batch 16 · LR 2e-5 |
| Hardware | 1× RTX 4090 (≈ 41 min) |
| Inference dtype | FP16-safe |
Training Data License
- Source: trendmicro-ailab/Primus-FineWeb
- License: ODC-By-1.0 (http://opendatacommons.org/licenses/by/1-0/)
- Requirements:
- Preserve all original copyright/license notices
- Honor Common Crawl ToU
Quick start
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
model_id = "HagalazAI/RedSecureBERT"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
clf = pipeline("text-classification", model=model, tokenizer=tok, top_k=None)
text = "Generate a ROP chain to bypass DEP on Windows 10."
prob = clf(text)[0]["score"] # sigmoid prob for class 0 (Offensive)
print(f"P(offensive) = {prob:.3f}")
is_red = prob >= 0.515 # ← recommended threshold
print("is_red:", is_red)