juanmcristobal commited on
Commit
7e5b84d
·
verified ·
1 Parent(s): f0e8111

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -19
README.md CHANGED
@@ -15,16 +15,24 @@ base_model:
15
  - answerdotai/ModernBERT-large
16
  ---
17
 
18
- # SecureModernBERT-NER
19
 
20
- SecureModernBERT-NER is a ModernBERT-base model fine-tuned to recognise named entities that appear in cyber-threat intelligence (CTI) narratives. It predicts BIO-formatted tags for 22 security-specific entity types (e.g., `MALWARE`, `THREAT-ACTOR`, `CVE`, `IPV4`, `URL`). The model is suitable for extracting indicators of compromise and contextual metadata from English-language threat reports, product advisories, and incident write-ups.
 
 
 
 
 
 
 
 
21
 
22
  ## Quick Start
23
 
24
  ```python
25
  from transformers import pipeline
26
 
27
- model_id = "juanmcristobal/autotrain-sec4"
28
 
29
  pipe = pipeline(
30
  task="token-classification",
@@ -155,21 +163,6 @@ These metrics were computed with the `seqeval` micro-average at the entity level
155
 
156
  The following tables report detailed results on a shared CTI validation set. **Do not compare the per-label values across models directly:** each checkpoint uses a different taxonomy or remapping strategy, so accuracy percentages can be misleading when labels are aligned or collapsed differently. Use the per-model tables to understand performance within a single schema, and interpret macro-accuracy scores with caution.
157
 
158
- ### PranavaKailash/CyNER-2.0-DeBERTa-v3-base
159
-
160
- | Label | Used | Accuracy |
161
- |-------|------|----------|
162
- | Indicator | 35,936 | 0.7878 |
163
- | Location | 7,895 | 0.0113 |
164
- | Malware | 12,125 | 0.7800 |
165
- | O | 2,896 | 0.7652 |
166
- | Organization | 42,537 | 0.6556 |
167
- | System | 35,063 | 0.7259 |
168
- | TOOL | 4,820 | 0.0000 |
169
- | Threat Group | 9,522 | 0.0000 |
170
- | Vulnerability | 27,673 | 0.1876 |
171
-
172
- - **Macro accuracy:** 0.4348
173
 
174
  ### CyberPeace-Institute/SecureBERT-NER
175
 
@@ -193,7 +186,23 @@ The following tables report detailed results on a shared CTI validation set. **D
193
  | URL | 6,997 | 0.0795 |
194
  | VULID | 27,586 | 0.3849 |
195
 
196
- - **Macro accuracy:** not reported (schema differs substantially from the others).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
197
 
198
  ### cisco-ai/SecureBERT2.0-NER
199
 
 
15
  - answerdotai/ModernBERT-large
16
  ---
17
 
18
+ # Model Overview
19
 
20
+ **SecureModernBERT-NER** represents a new generation of cybersecurity-focused language models combining the **state-of-the-art architecture of ModernBERT** with one of the **largest and most diverse CTI-labelled NER corpora ever built**.
21
+
22
+ Unlike conventional NER systems, SecureModernBERT-NER recognises **22 finely-grained, security-specific entity types**, covering the full spectrum of cyber-threat intelligence — from `THREAT-ACTOR` and `MALWARE` to `CVE`, `IPV4`, `DOMAIN`, and `REGISTRY-KEYS`.
23
+
24
+ Trained on more than **half a million manually curated spans** sourced from real-world threat reports, vulnerability advisories, and incident analyses, it achieves an exceptional balance of **accuracy, generalisation, and contextual depth**.
25
+
26
+ This model is designed to **parse complex security narratives with human-level precision**, extracting both contextual metadata (e.g., `ORG`, `PRODUCT`, `PLATFORM`) and highly technical indicators (e.g., `HASHES`, `URLS`, `NETWORK ADDRESSES`) — all within a single unified framework.
27
+
28
+ SecureModernBERT-NER sets a new standard for **automated CTI entity recognition**, enabling the next wave of **threat-intelligence automation, enrichment, and analytics**.
29
 
30
  ## Quick Start
31
 
32
  ```python
33
  from transformers import pipeline
34
 
35
+ model_id = "attack-vector/SecureModernBERT-NER"
36
 
37
  pipe = pipeline(
38
  task="token-classification",
 
163
 
164
  The following tables report detailed results on a shared CTI validation set. **Do not compare the per-label values across models directly:** each checkpoint uses a different taxonomy or remapping strategy, so accuracy percentages can be misleading when labels are aligned or collapsed differently. Use the per-model tables to understand performance within a single schema, and interpret macro-accuracy scores with caution.
165
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
 
167
  ### CyberPeace-Institute/SecureBERT-NER
168
 
 
186
  | URL | 6,997 | 0.0795 |
187
  | VULID | 27,586 | 0.3849 |
188
 
189
+ - **Macro accuracy:** 0.3820
190
+
191
+ ### PranavaKailash/CyNER-2.0-DeBERTa-v3-base
192
+
193
+ | Label | Used | Accuracy |
194
+ |-------|------|----------|
195
+ | Indicator | 35,936 | 0.7878 |
196
+ | Location | 7,895 | 0.0113 |
197
+ | Malware | 12,125 | 0.7800 |
198
+ | O | 2,896 | 0.7652 |
199
+ | Organization | 42,537 | 0.6556 |
200
+ | System | 35,063 | 0.7259 |
201
+ | TOOL | 4,820 | 0.0000 |
202
+ | Threat Group | 9,522 | 0.0000 |
203
+ | Vulnerability | 27,673 | 0.1876 |
204
+
205
+ - **Macro accuracy:** 0.4348
206
 
207
  ### cisco-ai/SecureBERT2.0-NER
208