juanmcristobal commited on
Commit
f0e8111
verified
1 Parent(s): 7577dd8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -6
README.md CHANGED
@@ -7,8 +7,12 @@ tags:
7
  - token-classification
8
  - cybersecurity
9
  - threat-intelligence
10
- datasets:
11
- - juanmcristobal/secureModernBert2
 
 
 
 
12
  ---
13
 
14
  # SecureModernBERT-NER
@@ -53,7 +57,7 @@ Sample output:
53
  ## Training Data
54
 
55
  - **Size:** 502,726 labelled text spans before filtering; 22 distinct entity classes in BIO format.
56
- - **Label distribution (spans):** `ORG` (~198k), `PRODUCT` (~79k), `MALWARE` (~67k), `PLATFORM` (~57k), `THREAT-ACTOR` (~49k), `SERVICE` (~46k), `CVE` (~41k), `LOC` (~38k), `SECTOR` (~34k), `TOOL` (~29k), plus indicator types such as `URL`, `IPV4`, `SHA256`, `MD5`, and `REGISTRY-KEYS`.
57
  - **Pre-processing:** JSONL articles were tokenised and converted to BIO tags; spans in conflict were resolved manually and via automated heuristics before upload.
58
 
59
  ## Label Mapping
@@ -218,14 +222,14 @@ If you find this model useful, please cite the repository and the base model:
218
 
219
  ```
220
  @software{securemodernbert_ner_2025,
221
- author = {Juan M. Cristobal},
222
  title = {SecureModernBERT-NER: Cyber Threat Intelligence Named Entity Recogniser},
223
  year = {2025},
224
  publisher = {Hugging Face},
225
- url = {https://huggingface.co/juanmcristobal/autotrain-sec4}
226
  }
227
  ```
228
 
229
  ## Contact
230
 
231
- Questions or feedback? Open an issue on the Hugging Face model repository or reach out at [`@juanmcristobal`](https://huggingface.co/juanmcristobal).
 
7
  - token-classification
8
  - cybersecurity
9
  - threat-intelligence
10
+ - secureBert
11
+ license: mit
12
+ metrics:
13
+ - accuracy
14
+ base_model:
15
+ - answerdotai/ModernBERT-large
16
  ---
17
 
18
  # SecureModernBERT-NER
 
57
  ## Training Data
58
 
59
  - **Size:** 502,726 labelled text spans before filtering; 22 distinct entity classes in BIO format.
60
+ - **Label distribution (spans):** `ORG` (approx. 198k), `PRODUCT` (approx. 79k), `MALWARE` (approx. 67k), `PLATFORM` (approx. 57k), `THREAT-ACTOR` (approx. 49k), `SERVICE` (approx. 46k), `CVE` (approx. 41k), `LOC` (approx. 38k), `SECTOR` (approx. 34k), `TOOL` (approx. 29k), plus indicator types such as `URL`, `IPV4`, `SHA256`, `MD5`, and `REGISTRY-KEYS`.
61
  - **Pre-processing:** JSONL articles were tokenised and converted to BIO tags; spans in conflict were resolved manually and via automated heuristics before upload.
62
 
63
  ## Label Mapping
 
222
 
223
  ```
224
  @software{securemodernbert_ner_2025,
225
+ author = {Juan Manuel Crist贸bal Moreno},
226
  title = {SecureModernBERT-NER: Cyber Threat Intelligence Named Entity Recogniser},
227
  year = {2025},
228
  publisher = {Hugging Face},
229
+ url = {https://huggingface.co/attack-vector/SecureModernBERT-NER}
230
  }
231
  ```
232
 
233
  ## Contact
234
 
235
+ Questions or feedback? Open an issue on the Hugging Face model repository or reach out at [`@juanmcristobal`](https://huggingface.co/juanmcristobal).