cisco-ai
/

SecureBERT2.0-NER

Token Classification

Model card Files Files and versions

cisco-ehsan commited on Oct 6, 2025

Commit

c5f31ed

·

verified ·

1 Parent(s): 2d42e37

Update README.md

Files changed (1) hide show

README.md +94 -1

README.md CHANGED Viewed

@@ -10,4 +10,97 @@ tags:
 - SecureBERT2
 - CyberNER
 library_name: transformers
----

 - SecureBERT2
 - CyberNER
 library_name: transformers
+---
+---
+language:
+- en
+license: apache-2.0
+tags:
+- named-entity-recognition
+- token-classification
+- cybersecurity
+- modernbert
+pipeline_tag: token-classification
+library_name: transformers
+---
+# Secure Modern BERT NER Model
+This is a **Named Entity Recognition (NER) model** fine-tuned on top of **ModernBertForTokenClassification**. It is designed for extracting cybersecurity entities such as Indicators, Malware, Organizations, Systems, and Vulnerabilities from text.
+---
+## Model Details
+### Model Description
+- **Model Type:** ModernBertForTokenClassification
+- **Tokenizer Type:** PreTrainedTokenizerFast
+- **Framework:** TensorFlow
+- **Number of Labels:** 11
+- **Labels / Entities:**
+  - `B-Indicator` / `I-Indicator`
+  - `B-Malware` / `I-Malware`
+  - `B-Organization` / `I-Organization`
+  - `B-System` / `I-System`
+  - `B-Vulnerability` / `I-Vulnerability`
+  - `O` (outside)
+- **Maximum Sequence Length:** 8192 tokens
+- **Task:** named-entity-recognition
+### Example Pipeline Output
+```python
+from transformers import pipeline
+ner = pipeline("ner", model="/teamspace/studios/this_studio/secure_modern_bert/Models/ner", tokenizer="/teamspace/studios/this_studio/secure_modern_bert/Models/ner")
+example_text = "John Doe works at OpenAI in San Francisco."
+ner_results = ner(example_text)
+print(ner_results)
+```
+### Model Configuration
+- Hidden size: 768
+- Intermediate size: 1152
+- Number of hidden layers: 22
+- Number of attention heads: 12
+- Max position embeddings: 8192
+- Vocabulary size: 50368
+- Activation Function: gelu
+- Dropout rates: all set to 0.0 (embedding, attention, MLP, classifier)
+Other configuration details are stored in the model_config JSON included with the model.
+## Usage
+```python
+from transformers import AutoTokenizer, TFAutoModelForTokenClassification, pipeline
+tokenizer = AutoTokenizer.from_pretrained("/teamspace/studios/this_studio/secure_modern_bert/Models/ner")
+model = TFAutoModelForTokenClassification.from_pretrained("/teamspace/studios/this_studio/secure_modern_bert/Models/ner")
+ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)
+text = "Stealc malware targets browser cookies and passwords."
+entities = ner_pipeline(text)
+print(entities)
+```
+## Reference
+```
+@article{aghaei2025securebert,
+  title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
+  author={Aghaei, Ehsan and Jain, Sarthak and Arun, Prashanth and Sambamoorthy, Arjun},
+  journal={arXiv preprint arXiv:2510.00240},
+  year={2025}
+}
+```