HagalazAI commited on
Commit
6cbe5de
·
verified ·
1 Parent(s): f8766c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -3
README.md CHANGED
@@ -1,3 +1,78 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-classification
4
+ - security
5
+ - blue-team
6
+ - roberta
7
+ license: apache-2.0
8
+ datasets:
9
+ - trendmicro-ailab/Primus-FineWeb
10
+ metrics:
11
+ - precision
12
+ - recall
13
+ - f1
14
+ pipeline_tag: text-classification
15
+ library_name: transformers
16
+ ---
17
+
18
+ # BlueSecureBERT 🟦🛡️
19
+
20
+ Detects **blue-team / defensive security** text (English).
21
+
22
+ | Split | Precision | Recall | F1 | F₂ | CE-loss | Threshold |
23
+ |-------------|-----------|--------|-------|-------|---------|-----------|
24
+ | Validation | **0.949** | **0.991** | **0.969** | **0.982** | **0.011** | **0.579** |
25
+
26
+ > **Recommended cut-off:** `prob >= 0.50` (arg-max on the validation split)
27
+
28
+ ---
29
+
30
+ ## Intended uses & limits
31
+
32
+ * **Triage** incident reports, chat logs, or bug-bounty write-ups
33
+ * **Input language:** English
34
+ * **No external test set** yet → treat numbers as optimistic
35
+
36
+ ---
37
+
38
+ ## Training data
39
+
40
+ | Label | Rows |
41
+ |-----------|---------|
42
+ | Offensive | 30 746 |
43
+ | Defensive | 19 550 |
44
+ | Other | 130 000 |
45
+ | **Total** | **180 296** |
46
+
47
+ Source: [Primus-FineWeb](https://huggingface.co/datasets/trendmicro-ailab/Primus-FineWeb)
48
+
49
+ ---
50
+
51
+ ## Model details
52
+
53
+ | Field | Value |
54
+ |----------------|------------------------------------------------------|
55
+ | Base encoder | `ehsanaghaei/SecureBERT` (RoBERTa-base, 125 M) |
56
+ | Objective | One-vs-rest, focal-loss (γ = 2) |
57
+ | Training | 3 epochs · micro-batch 16 · LR 2e-5 |
58
+ | Hardware | 1× RTX 4090 (≈ 41 min) |
59
+ | Inference dtype| FP16-safe |
60
+
61
+ ---
62
+
63
+ ## Quick start
64
+
65
+ ```python
66
+ from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
67
+
68
+ model_id = "HagalazAI/BlueSecureBERT"
69
+ tok = AutoTokenizer.from_pretrained(model_id)
70
+ model = AutoModelForSequenceClassification.from_pretrained(model_id)
71
+ clf = pipeline("text-classification", model=model, tokenizer=tok, top_k=None)
72
+
73
+ text = "Investigate potential SQL injection vulnerabilities."
74
+ prob = clf(text)[0]["score"] # sigmoid prob for class 0 (Defensive)
75
+ print(f"P(defensive) = {prob:.3f}")
76
+
77
+ is_blue = prob >= 0.579 # ← recommended threshold
78
+ print("is_blue:", is_blue)