Update README.md
Browse files
README.md
CHANGED
|
@@ -7,8 +7,12 @@ tags:
|
|
| 7 |
- token-classification
|
| 8 |
- cybersecurity
|
| 9 |
- threat-intelligence
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# SecureModernBERT-NER
|
|
@@ -53,7 +57,7 @@ Sample output:
|
|
| 53 |
## Training Data
|
| 54 |
|
| 55 |
- **Size:** 502,726 labelled text spans before filtering; 22 distinct entity classes in BIO format.
|
| 56 |
-
- **Label distribution (spans):** `ORG` (
|
| 57 |
- **Pre-processing:** JSONL articles were tokenised and converted to BIO tags; spans in conflict were resolved manually and via automated heuristics before upload.
|
| 58 |
|
| 59 |
## Label Mapping
|
|
@@ -218,14 +222,14 @@ If you find this model useful, please cite the repository and the base model:
|
|
| 218 |
|
| 219 |
```
|
| 220 |
@software{securemodernbert_ner_2025,
|
| 221 |
-
author = {Juan
|
| 222 |
title = {SecureModernBERT-NER: Cyber Threat Intelligence Named Entity Recogniser},
|
| 223 |
year = {2025},
|
| 224 |
publisher = {Hugging Face},
|
| 225 |
-
url = {https://huggingface.co/
|
| 226 |
}
|
| 227 |
```
|
| 228 |
|
| 229 |
## Contact
|
| 230 |
|
| 231 |
-
Questions or feedback? Open an issue on the Hugging Face model repository or reach out at [`@juanmcristobal`](https://huggingface.co/juanmcristobal).
|
|
|
|
| 7 |
- token-classification
|
| 8 |
- cybersecurity
|
| 9 |
- threat-intelligence
|
| 10 |
+
- secureBert
|
| 11 |
+
license: mit
|
| 12 |
+
metrics:
|
| 13 |
+
- accuracy
|
| 14 |
+
base_model:
|
| 15 |
+
- answerdotai/ModernBERT-large
|
| 16 |
---
|
| 17 |
|
| 18 |
# SecureModernBERT-NER
|
|
|
|
| 57 |
## Training Data
|
| 58 |
|
| 59 |
- **Size:** 502,726 labelled text spans before filtering; 22 distinct entity classes in BIO format.
|
| 60 |
+
- **Label distribution (spans):** `ORG` (approx. 198k), `PRODUCT` (approx. 79k), `MALWARE` (approx. 67k), `PLATFORM` (approx. 57k), `THREAT-ACTOR` (approx. 49k), `SERVICE` (approx. 46k), `CVE` (approx. 41k), `LOC` (approx. 38k), `SECTOR` (approx. 34k), `TOOL` (approx. 29k), plus indicator types such as `URL`, `IPV4`, `SHA256`, `MD5`, and `REGISTRY-KEYS`.
|
| 61 |
- **Pre-processing:** JSONL articles were tokenised and converted to BIO tags; spans in conflict were resolved manually and via automated heuristics before upload.
|
| 62 |
|
| 63 |
## Label Mapping
|
|
|
|
| 222 |
|
| 223 |
```
|
| 224 |
@software{securemodernbert_ner_2025,
|
| 225 |
+
author = {Juan Manuel Crist贸bal Moreno},
|
| 226 |
title = {SecureModernBERT-NER: Cyber Threat Intelligence Named Entity Recogniser},
|
| 227 |
year = {2025},
|
| 228 |
publisher = {Hugging Face},
|
| 229 |
+
url = {https://huggingface.co/attack-vector/SecureModernBERT-NER}
|
| 230 |
}
|
| 231 |
```
|
| 232 |
|
| 233 |
## Contact
|
| 234 |
|
| 235 |
+
Questions or feedback? Open an issue on the Hugging Face model repository or reach out at [`@juanmcristobal`](https://huggingface.co/juanmcristobal).
|