File size: 7,821 Bytes

---
license: apache-2.0
language:
- en
base_model:
- cisco-ai/SecureBERT2.0-base
pipeline_tag: token-classification
library_name: transformers
tags:
- NER
- SecureBERT2
- CyberNER
- token-classification
- cybersecurity
---

# Model Card for cisco-ai/SecureBERT2.0-NER

The **Secure Modern BERT NER Model** is a fine-tuned transformer based on [**SecureBERT 2.0**](https://huggingface.co/cisco-ai/SecureBERT2.0-base), designed for **Named Entity Recognition (NER)** in cybersecurity text.  

It extracts domain-specific entities such as **Indicators, Malware, Organizations, Systems, and Vulnerabilities** from unstructured data sources like threat reports, incident analyses, advisories, and blogs.  

NER in cybersecurity enables:
- Automated extraction of indicators of compromise (IOCs)  
- Structuring of unstructured threat intelligence text  
- Improved situational awareness for analysts  
- Faster incident response and vulnerability triage  

---

## Model Details

### Model Description

- **Developed by:** Cisco AI   
- **Model Type:** ModernBertForTokenClassification  
- **Framework:** TensorFlow / Transformers  
- **Tokenizer Type:** PreTrainedTokenizerFast  
- **Number of Labels:** 11  
- **Task:** Named Entity Recognition (NER)  
- **License:** Apache-2.0  
- **Language:** English  
- **Base Model:** [cisco-ai/SecureBERT2.0](https://huggingface.co/cisco-ai/SecureBERT2.0-base)

#### Supported Entity Labels

| Entity | Description |
|:--------|:-------------|
| `B-Indicator`, `I-Indicator` | Indicators of Compromise (e.g., IPs, domains, hashes) |
| `B-Malware`, `I-Malware` | Malware or exploit names |
| `B-Organization`, `I-Organization` | Companies or groups mentioned |
| `B-System`, `I-System` | Affected software or platforms |
| `B-Vulnerability`, `I-Vulnerability` | Specific CVEs or flaw descriptions |
| `O` | Outside token |

#### Model Configuration

| Parameter | Value |
|:-----------|:-------|
| Hidden size | 768 |
| Intermediate size | 1152 |
| Hidden layers | 22 |
| Attention heads | 12 |
| Max sequence length | 8192 |
| Vocabulary size | 50368 |
| Activation | GELU |
| Dropout | 0.0 (embedding, attention, MLP, classifier) |

---

## Uses

### Direct Use

- Named Entity Recognition (NER) on cybersecurity text  
- Threat intelligence enrichment  
- IOC extraction and normalization  
- Incident report analysis  
- Vulnerability mention detection  

### Downstream Use

This model can be integrated into:
- Threat intelligence platforms (TIPs)  
- SOC automation tools  
- Cybersecurity knowledge graphs  
- Vulnerability management and CVE monitoring systems  

### Out-of-Scope Use

- Non-technical or general-domain NER tasks  
- Generative or conversational AI applications  

---

## Benchmark Cybersecurity NER Corpus

### Dataset Overview

| Aspect | Description |
|:-------|:-------------|
| **Purpose** | Benchmark dataset for extracting cybersecurity entities from unstructured reports |
| **Data Source** | Curated threat intelligence documents emphasizing malware and system analysis |
| **Annotation Methodology** | Fully hand-labeled by domain experts |
| **Entity Types** | Malware, Indicator, System, Organization, Vulnerability |
| **Size** | 3.4k training samples + 717 test samples |

---

## How to Get Started with the Model

### Example Usage (Transformers)

```python
from transformers import AutoTokenizer, TFAutoModelForTokenClassification, pipeline

model_name = "cisco-ai/SecureBERT2.0-NER"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForTokenClassification.from_pretrained(model_name)

ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

text = "Stealc malware targets browser cookies and passwords."
entities = ner_pipeline(text)
print(entities)
```

## Training Details

### Training Objective and Procedure

The `SecureBERT2.0-NER` was fine-tuned for **token-level classification** on cybersecurity text using **Cross Entropy Loss**.  
Training focused on accurately classifying entity boundaries and types across five cybersecurity-specific categories: *Malware, Indicator, System, Organization,* and *Vulnerability*.

The **AdamW** optimizer was used with a **linear learning rate scheduler**, and gradient clipping ensured stability during fine-tuning.

### Training Configuration

| Setting | Value |
|:---------|:------:|
| Objective | Token-wise Cross Entropy |
| Optimizer | AdamW |
| Learning Rate | 1e-5 |
| Weight Decay | 0.001 |
| Batch Size per GPU | 8 |
| Epochs | 20 |
| Max Sequence Length | 1024 |
| Gradient Clipping Norm | 1.0 |
| Scheduler | Linear |
| Mixed Precision | fp16 |
| Framework | TensorFlow / Transformers |

### Training Dataset

The model was fine-tuned on a **cybersecurity-specific NER corpus**, containing annotated threat intelligence reports, advisories, and technical documentation.

| Property | Description |
|:----------|:-------------|
| **Dataset Type** | Manually annotated corpus |
| **Language** | English |
| **Entity Types** | Malware, Indicator, System, Organization, Vulnerability |
| **Train Size** | 3,400 samples |
| **Test Size** | 717 samples |
| **Annotation Method** | Expert hand-labeling for accuracy and consistency |

### Preprocessing

- Texts were tokenized using the `PreTrainedTokenizerFast` tokenizer from SecureBERT 2.0.  
- All sequences were truncated or padded to 1024 tokens.  
- Labels were aligned with subword tokens to maintain token–label consistency.  

### Hardware and Training Setup

| Component | Description |
|:-----------|:-------------|
| GPUs Used | 8× NVIDIA A100 |
| Precision | Mixed precision (fp16) |
| Batch Size | 8 per GPU |
| Framework | Transformers (TensorFlow backend) |

### Optimization Summary

The model converged after approximately **20 epochs**, with loss stabilizing at a low level.  
Validation metrics (F1, precision, recall) showed steady improvement from epoch 3 onward, confirming effective domain-specific adaptation.



## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

Evaluation was conducted on a **cybersecurity-specific NER benchmark corpus** containing annotated threat reports, advisories, and incident analysis texts.  
This benchmark includes five key entity types: **Malware, Indicator, System, Organization, and Vulnerability**.

#### Metrics

The following metrics were used to assess model performance:
- **F1-score:** Harmonic mean of precision and recall  
- **Recall:** Measures how many true entities were correctly identified  
- **Precision:** Measures how many predicted entities were correct  

### Results

| Model | F1 | Recall | Precision |
|:------|:---:|:-------:|:-----------:|
| **CyBERT** | 0.351 | 0.281 | 0.467 |
| **SecureBERT** | 0.734 | 0.759 | 0.717 |
| **SecureBERT 2.0 (Ours)** | **0.945** | **0.965** | **0.927** |

#### Summary

The **SecureBERT 2.0 NER model** significantly outperforms both CyBERT and the original SecureBERT across all metrics.  

- It achieves a **F1-score of 0.945**, a **+21% absolute improvement** over SecureBERT.  
- Its **recall (0.965)** indicates excellent coverage of cybersecurity entities.  
- Its **precision (0.927)** shows strong accuracy and low false-positive rates.  

This demonstrates that **domain-adaptive pretraining and fine-tuning** on cybersecurity corpora dramatically improves NER performance compared to general or earlier models.

---
## Reference
```
@article{aghaei2025securebert,
  title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
  author={Aghaei, Ehsan and Jain, Sarthak and Arun, Prashanth and Sambamoorthy, Arjun},
  journal={arXiv preprint arXiv:2510.00240},
  year={2025}
}
```

---

## Model Card Authors

Cisco AI 

## Model Card Contact

For inquiries, please contact [ai-threat-intel@cisco.com](mailto:ai-threat-intel@cisco.com)