BlueSecureBERT / README.md

HagalazAI

Update README.md

3b7b475 verified 12 months ago

5.04 kB

tags:
  - text-classification
  - security
  - blue-team
  - roberta
license: odc-by
datasets:
  - trendmicro-ailab/Primus-FineWeb
metrics:
  - precision
  - recall
  - f1
pipeline_tag: text-classification
library_name: transformers

BlueSecureBERT 🟦🛡️

Detects blue-team / defensive security text (English), with a focus on technical detection-engineering workflows (SIEM ingestion, Sigma rules, alert-tuning).

Split	Precision	Recall	F1	F₂	CE-loss	Threshold
Validation	0.949	0.991	0.969	0.982	0.011	0.579

Recommended cut-off: prob >= 0.579 (arg-max on the validation split)

Demo

Phrase	Blue Score
To exfiltrate sensitive data, launch a phishing campaign that tricks employees into revealing their VPN credentials.	0.066
We should deploy an EDR solution, monitor all endpoints for intrusion attempts, and enforce strict password policies.	0.557
Our marketing team will unveil the new cybersecurity branding materials at next Tuesday’s antivirus product launch	0.256
I'm excited about the company picnic. There's no cybersecurity topic—just burgers and games.	0.272

Intended uses & limits

Triage large corpora for techial detection engineering, sysmon, sigma, SIEM, indicators of compromise related data.
Input language: English
No external test set yet → treat numbers as optimistic

Training data

Label	Rows
Offensive	30 746
Defensive	19 550
Other	130 000
Total	180 296

Model details

Field	Value
Base encoder	`ehsanaghaei/SecureBERT` (RoBERTa-base, 125 M)
Objective	One-vs-rest, focal-loss (γ = 2)
Training	3 epochs · micro-batch 16 · LR 2e-5
Hardware	1× RTX 4090 (≈ 41 min)
Inference dtype	FP16-safe

Training Data License

Source: trendmicro-ailab/Primus-FineWeb
License: ODC-By-1.0 (http://opendatacommons.org/licenses/by/1-0/)
Requirements:
- Preserve all original copyright/license notices
- Honor Common Crawl ToU

Quick start

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def classify_texts(model_name, phrases, threshold=0.515):
    """
    Returns a list of (probability_offensive, label) tuples for each phrase
    given a model_name and threshold.
    """
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    model.eval()

    inputs = tokenizer(phrases, padding=True, truncation=True, return_tensors="pt")

    with torch.no_grad():
        logits = model(**inputs).logits  # shape: (batch_size, 2)
        probs_offensive = torch.softmax(logits, dim=1)[:, 1]  # Probability of the "Offensive" class

    results = []
    for p_val in probs_offensive:
        p_val = p_val.item()
        label = "Offensive (red-team)" if p_val >= threshold else "Not Offensive"
        results.append((p_val, label))
    return results

def main():
    # Example phrases: Offensive (red-team), Defensive (blue-team), Non-technical
    phrases = [
        # 1) Cybersecurity Offensive / red-team
        "To exfiltrate sensitive data, launch a phishing campaign that tricks employees into revealing their VPN credentials.",
        # 2) Cybersecurity Defensive / blue-team
        "We should deploy an EDR solution, monitor all endpoints for intrusion attempts, and enforce strict password policies.",
        # 5) Cybersecruity Marketing
        "“Our marketing team will unveil the new cybersecurity branding materials at next Tuesday’s antivirus product launch",
        # 5) Non Cybersecruity  related
        "I'm excited about the company picnic. There's no cybersecurity topic—just burgers and games."
    ]

    # Classify with both models
    threshold = 0.515
    blue_results = classify_texts("HagalazAI/BlueSecureBERT", phrases, threshold)
    red_results = classify_texts("HagalazAI/RedSecureBERT", phrases, threshold)

    # Print a Markdown table
    print("| # | Phrase | Blue Score | Blue Label | Red Score | Red Label |")
    print("|---|--------|-----------|-----------|----------|----------|")
    for i, text in enumerate(phrases, start=1):
        blue_score, blue_label = blue_results[i - 1]
        red_score, red_label = red_results[i - 1]
        print(f"| {i} | {text} | {blue_score:.3f} | {blue_label} | {red_score:.3f} | {red_label} |")

if __name__ == "__main__":
    main()