HagalazAI commited on
Commit
f261973
·
verified ·
1 Parent(s): 2e30d56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -3
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-classification
4
+ - security
5
+ - red-team
6
+ - roberta
7
+ license: apache-2.0
8
+ datasets:
9
+ - trendmicro-ailab/Primus-FineWeb
10
+ metrics:
11
+ - precision
12
+ - recall
13
+ - f1
14
+ pipeline_tag: text-classification
15
+ library_name: transformers
16
+ ---
17
+
18
+ # RedSecureBERT 🔴🛡️
19
+
20
+ Detects **red-team / offensive security** text (English).
21
+
22
+ | Split | Precision | Recall | F1 | Threshold |
23
+ |-------|-----------|--------|----|-----------|
24
+ | Validation | **0.963** | **0.991** | **0.977** | **0.515** |
25
+
26
+ > **Recommended cut-off:** `prob >= 0.515` (chosen via F₂ on the validation split).
27
+
28
+ ---
29
+
30
+ ## Intended uses & limits
31
+
32
+ * **Triaging** large corpora, chat logs, or bug-bounty reports.
33
+ * **Input language:** English.
34
+ * **No external test set** yet → treat scores as optimistic.
35
+
36
+ ---
37
+
38
+ ## Training data (quick view)
39
+
40
+ | Label | Rows |
41
+ |-------|------|
42
+ | Offensive | 30 746 |
43
+ | Defensive | 19 550 |
44
+ | Other | 130 000 |
45
+ | **Total** | **180 296** |
46
+
47
+ Source: *Primus-FineWeb* (filtered & hand-labelled).
48
+
49
+ ---
50
+
51
+ ## Model details
52
+
53
+ | Field | Value |
54
+ |-------|-------|
55
+ | Base encoder | `ehsanaghaei/SecureBERT` (RoBERTa-base, 125 M) |
56
+ | Objective | One-vs-rest, focal-loss (γ = 2) |
57
+ | Epochs | 3  ·  micro-batch 16  ·  LR 2e-5 |
58
+ | Hardware | 1× RTX 4090 (≈ 41 min) |
59
+ | Inference dtype | FP16-safe |
60
+
61
+ ---
62
+
63
+ ## Quick start
64
+
65
+ ```python
66
+ from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
67
+
68
+ model_id = "HagalazAI/RedSecureBERT"
69
+ tok = AutoTokenizer.from_pretrained(model_id)
70
+ model = AutoModelForSequenceClassification.from_pretrained(model_id)
71
+
72
+ clf = pipeline("text-classification", model=model, tokenizer=tok, top_k=None)
73
+
74
+ text = "Generate a ROP chain to bypass DEP on Windows 10."
75
+ prob = clf(text)[0]["score"] # sigmoid prob for class 0 (Offensive)
76
+ print(f"P(offensive) = {prob:.3f}")
77
+
78
+ is_red = prob >= 0.515 # ← recommended threshold
79
+ print("is_red:", is_red)