cisco-ehsan commited on
Commit
c5f31ed
·
verified ·
1 Parent(s): 2d42e37

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -1
README.md CHANGED
@@ -10,4 +10,97 @@ tags:
10
  - SecureBERT2
11
  - CyberNER
12
  library_name: transformers
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  - SecureBERT2
11
  - CyberNER
12
  library_name: transformers
13
+ ---
14
+
15
+
16
+ ---
17
+ language:
18
+ - en
19
+ license: apache-2.0
20
+ tags:
21
+ - named-entity-recognition
22
+ - token-classification
23
+ - cybersecurity
24
+ - modernbert
25
+ pipeline_tag: token-classification
26
+ library_name: transformers
27
+ ---
28
+
29
+ # Secure Modern BERT NER Model
30
+
31
+ This is a **Named Entity Recognition (NER) model** fine-tuned on top of **ModernBertForTokenClassification**. It is designed for extracting cybersecurity entities such as Indicators, Malware, Organizations, Systems, and Vulnerabilities from text.
32
+
33
+ ---
34
+
35
+ ## Model Details
36
+
37
+ ### Model Description
38
+ - **Model Type:** ModernBertForTokenClassification
39
+ - **Tokenizer Type:** PreTrainedTokenizerFast
40
+ - **Framework:** TensorFlow
41
+ - **Number of Labels:** 11
42
+ - **Labels / Entities:**
43
+ - `B-Indicator` / `I-Indicator`
44
+ - `B-Malware` / `I-Malware`
45
+ - `B-Organization` / `I-Organization`
46
+ - `B-System` / `I-System`
47
+ - `B-Vulnerability` / `I-Vulnerability`
48
+ - `O` (outside)
49
+ - **Maximum Sequence Length:** 8192 tokens
50
+ - **Task:** named-entity-recognition
51
+
52
+ ### Example Pipeline Output
53
+ ```python
54
+ from transformers import pipeline
55
+
56
+ ner = pipeline("ner", model="/teamspace/studios/this_studio/secure_modern_bert/Models/ner", tokenizer="/teamspace/studios/this_studio/secure_modern_bert/Models/ner")
57
+
58
+ example_text = "John Doe works at OpenAI in San Francisco."
59
+ ner_results = ner(example_text)
60
+ print(ner_results)
61
+ ```
62
+
63
+ ### Model Configuration
64
+
65
+ - Hidden size: 768
66
+
67
+ - Intermediate size: 1152
68
+
69
+ - Number of hidden layers: 22
70
+
71
+ - Number of attention heads: 12
72
+
73
+ - Max position embeddings: 8192
74
+
75
+ - Vocabulary size: 50368
76
+
77
+ - Activation Function: gelu
78
+
79
+ - Dropout rates: all set to 0.0 (embedding, attention, MLP, classifier)
80
+
81
+ Other configuration details are stored in the model_config JSON included with the model.
82
+
83
+ ## Usage
84
+ ```python
85
+ from transformers import AutoTokenizer, TFAutoModelForTokenClassification, pipeline
86
+
87
+ tokenizer = AutoTokenizer.from_pretrained("/teamspace/studios/this_studio/secure_modern_bert/Models/ner")
88
+ model = TFAutoModelForTokenClassification.from_pretrained("/teamspace/studios/this_studio/secure_modern_bert/Models/ner")
89
+
90
+ ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)
91
+
92
+ text = "Stealc malware targets browser cookies and passwords."
93
+ entities = ner_pipeline(text)
94
+ print(entities)
95
+ ```
96
+ ## Reference
97
+ ```
98
+ @article{aghaei2025securebert,
99
+ title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
100
+ author={Aghaei, Ehsan and Jain, Sarthak and Arun, Prashanth and Sambamoorthy, Arjun},
101
+ journal={arXiv preprint arXiv:2510.00240},
102
+ year={2025}
103
+ }
104
+ ```
105
+
106
+