File size: 7,821 Bytes
0dc7548
 
 
 
 
b1ccd1f
0dc7548
700d718
0dc7548
 
 
 
700d718
 
c5f31ed
 
792db5b
c5f31ed
792db5b
c5f31ed
700d718
299e961
700d718
 
 
 
 
299e961
 
 
700d718
299e961
700d718
299e961
c49c2df
700d718
 
 
 
 
 
 
792db5b
700d718
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
299e961
 
 
700d718
299e961
700d718
299e961
700d718
 
 
 
 
299e961
700d718
299e961
700d718
 
 
 
 
299e961
700d718
c5f31ed
700d718
 
c5f31ed
700d718
c5f31ed
700d718
c5f31ed
700d718
 
 
 
 
 
 
 
 
c5f31ed
700d718
c5f31ed
700d718
c5f31ed
700d718
c5f31ed
 
 
 
792db5b
700d718
 
 
c5f31ed
 
 
 
 
 
6302dd5
700d718
 
 
 
 
6302dd5
700d718
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6302dd5
700d718
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c5f31ed
 
 
 
 
 
 
 
 
 
700d718
 
 
 
c49c2df
700d718
 
c5f31ed
14f7fc0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
---
license: apache-2.0
language:
- en
base_model:
- cisco-ai/SecureBERT2.0-base
pipeline_tag: token-classification
library_name: transformers
tags:
- NER
- SecureBERT2
- CyberNER
- token-classification
- cybersecurity
---

# Model Card for cisco-ai/SecureBERT2.0-NER

The **Secure Modern BERT NER Model** is a fine-tuned transformer based on [**SecureBERT 2.0**](https://huggingface.co/cisco-ai/SecureBERT2.0-base), designed for **Named Entity Recognition (NER)** in cybersecurity text.  

It extracts domain-specific entities such as **Indicators, Malware, Organizations, Systems, and Vulnerabilities** from unstructured data sources like threat reports, incident analyses, advisories, and blogs.  

NER in cybersecurity enables:
- Automated extraction of indicators of compromise (IOCs)  
- Structuring of unstructured threat intelligence text  
- Improved situational awareness for analysts  
- Faster incident response and vulnerability triage  

---

## Model Details

### Model Description

- **Developed by:** Cisco AI   
- **Model Type:** ModernBertForTokenClassification  
- **Framework:** TensorFlow / Transformers  
- **Tokenizer Type:** PreTrainedTokenizerFast  
- **Number of Labels:** 11  
- **Task:** Named Entity Recognition (NER)  
- **License:** Apache-2.0  
- **Language:** English  
- **Base Model:** [cisco-ai/SecureBERT2.0](https://huggingface.co/cisco-ai/SecureBERT2.0-base)

#### Supported Entity Labels

| Entity | Description |
|:--------|:-------------|
| `B-Indicator`, `I-Indicator` | Indicators of Compromise (e.g., IPs, domains, hashes) |
| `B-Malware`, `I-Malware` | Malware or exploit names |
| `B-Organization`, `I-Organization` | Companies or groups mentioned |
| `B-System`, `I-System` | Affected software or platforms |
| `B-Vulnerability`, `I-Vulnerability` | Specific CVEs or flaw descriptions |
| `O` | Outside token |

#### Model Configuration

| Parameter | Value |
|:-----------|:-------|
| Hidden size | 768 |
| Intermediate size | 1152 |
| Hidden layers | 22 |
| Attention heads | 12 |
| Max sequence length | 8192 |
| Vocabulary size | 50368 |
| Activation | GELU |
| Dropout | 0.0 (embedding, attention, MLP, classifier) |

---

## Uses

### Direct Use

- Named Entity Recognition (NER) on cybersecurity text  
- Threat intelligence enrichment  
- IOC extraction and normalization  
- Incident report analysis  
- Vulnerability mention detection  

### Downstream Use

This model can be integrated into:
- Threat intelligence platforms (TIPs)  
- SOC automation tools  
- Cybersecurity knowledge graphs  
- Vulnerability management and CVE monitoring systems  

### Out-of-Scope Use

- Non-technical or general-domain NER tasks  
- Generative or conversational AI applications  

---

## Benchmark Cybersecurity NER Corpus

### Dataset Overview

| Aspect | Description |
|:-------|:-------------|
| **Purpose** | Benchmark dataset for extracting cybersecurity entities from unstructured reports |
| **Data Source** | Curated threat intelligence documents emphasizing malware and system analysis |
| **Annotation Methodology** | Fully hand-labeled by domain experts |
| **Entity Types** | Malware, Indicator, System, Organization, Vulnerability |
| **Size** | 3.4k training samples + 717 test samples |

---

## How to Get Started with the Model

### Example Usage (Transformers)

```python
from transformers import AutoTokenizer, TFAutoModelForTokenClassification, pipeline

model_name = "cisco-ai/SecureBERT2.0-NER"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForTokenClassification.from_pretrained(model_name)

ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

text = "Stealc malware targets browser cookies and passwords."
entities = ner_pipeline(text)
print(entities)
```

## Training Details

### Training Objective and Procedure

The `SecureBERT2.0-NER` was fine-tuned for **token-level classification** on cybersecurity text using **Cross Entropy Loss**.  
Training focused on accurately classifying entity boundaries and types across five cybersecurity-specific categories: *Malware, Indicator, System, Organization,* and *Vulnerability*.

The **AdamW** optimizer was used with a **linear learning rate scheduler**, and gradient clipping ensured stability during fine-tuning.

### Training Configuration

| Setting | Value |
|:---------|:------:|
| Objective | Token-wise Cross Entropy |
| Optimizer | AdamW |
| Learning Rate | 1e-5 |
| Weight Decay | 0.001 |
| Batch Size per GPU | 8 |
| Epochs | 20 |
| Max Sequence Length | 1024 |
| Gradient Clipping Norm | 1.0 |
| Scheduler | Linear |
| Mixed Precision | fp16 |
| Framework | TensorFlow / Transformers |

### Training Dataset

The model was fine-tuned on a **cybersecurity-specific NER corpus**, containing annotated threat intelligence reports, advisories, and technical documentation.

| Property | Description |
|:----------|:-------------|
| **Dataset Type** | Manually annotated corpus |
| **Language** | English |
| **Entity Types** | Malware, Indicator, System, Organization, Vulnerability |
| **Train Size** | 3,400 samples |
| **Test Size** | 717 samples |
| **Annotation Method** | Expert hand-labeling for accuracy and consistency |

### Preprocessing

- Texts were tokenized using the `PreTrainedTokenizerFast` tokenizer from SecureBERT 2.0.  
- All sequences were truncated or padded to 1024 tokens.  
- Labels were aligned with subword tokens to maintain token–label consistency.  

### Hardware and Training Setup

| Component | Description |
|:-----------|:-------------|
| GPUs Used | 8× NVIDIA A100 |
| Precision | Mixed precision (fp16) |
| Batch Size | 8 per GPU |
| Framework | Transformers (TensorFlow backend) |

### Optimization Summary

The model converged after approximately **20 epochs**, with loss stabilizing at a low level.  
Validation metrics (F1, precision, recall) showed steady improvement from epoch 3 onward, confirming effective domain-specific adaptation.



## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

Evaluation was conducted on a **cybersecurity-specific NER benchmark corpus** containing annotated threat reports, advisories, and incident analysis texts.  
This benchmark includes five key entity types: **Malware, Indicator, System, Organization, and Vulnerability**.

#### Metrics

The following metrics were used to assess model performance:
- **F1-score:** Harmonic mean of precision and recall  
- **Recall:** Measures how many true entities were correctly identified  
- **Precision:** Measures how many predicted entities were correct  

### Results

| Model | F1 | Recall | Precision |
|:------|:---:|:-------:|:-----------:|
| **CyBERT** | 0.351 | 0.281 | 0.467 |
| **SecureBERT** | 0.734 | 0.759 | 0.717 |
| **SecureBERT 2.0 (Ours)** | **0.945** | **0.965** | **0.927** |

#### Summary

The **SecureBERT 2.0 NER model** significantly outperforms both CyBERT and the original SecureBERT across all metrics.  

- It achieves a **F1-score of 0.945**, a **+21% absolute improvement** over SecureBERT.  
- Its **recall (0.965)** indicates excellent coverage of cybersecurity entities.  
- Its **precision (0.927)** shows strong accuracy and low false-positive rates.  

This demonstrates that **domain-adaptive pretraining and fine-tuning** on cybersecurity corpora dramatically improves NER performance compared to general or earlier models.

---
## Reference
```
@article{aghaei2025securebert,
  title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
  author={Aghaei, Ehsan and Jain, Sarthak and Arun, Prashanth and Sambamoorthy, Arjun},
  journal={arXiv preprint arXiv:2510.00240},
  year={2025}
}
```

---

## Model Card Authors

Cisco AI 

## Model Card Contact

For inquiries, please contact [ai-threat-intel@cisco.com](mailto:ai-threat-intel@cisco.com)