ner-cybersecurity / README.md
pki's picture
v2: +24.5% F1, CVE/ACRONYM support, CISO fix
9bd9d77 verified

NER Cybersecurity Model v2

Extract security skills, certs, and threats from CVs and job posts in milliseconds.

Model Details

  • Base Model: en_core_web_sm (spaCy 2.2.5)
  • Training: Prodigy 1.9.9 with 1805 annotated examples
  • Accuracy: 99.5% on evaluation set
  • Version: 2.0 (improved coverage)

Improvements over v1

  • +24.5% F1 (27.5% → 52.0% on test set)
  • CVE recognition now working (80% F1, was 0%)
  • ACRONYM recognition added (66.7% F1, was 0%)
  • CISO correctly tagged as SECURITY_ROLE (was CERTIFICATION)
  • 805 new training examples covering previously missing types

Entity Types (13)

Category Entity Types
Core Security SECURITY_ROLE, SECURITY_TOOL, CERTIFICATION, FRAMEWORK
Threats CVE, THREAT_TYPE, ATTACK_TECHNIQUE
Skills TECHNICAL_SKILL, ACRONYM, SECURITY_DOMAIN
Compliance REGULATION, CONTROL_ID, AUDIT_TERM

Usage

import spacy

nlp = spacy.load("pki/ner-cybersecurity")

doc = nlp("CISO with CISSP. Patched CVE-2024-1234. Expert in SIEM and EDR.")
for ent in doc.ents:
    print(f"{ent.label_}: {ent.text}")

Output:

SECURITY_ROLE: CISO
CERTIFICATION: CISSP
CVE: CVE-2024-1234
SECURITY_TOOL: SIEM
ACRONYM: EDR

Training Data Sources

  • pki-ad-match role YAMLs (62 roles)
  • agent-nexus-85 agent presets (90 agents)
  • Synthetic examples with cybersecurity entities
  • 805 new examples for underrepresented types

Requirements

  • spaCy 2.2.x
  • Python 3.6-3.8

License

MIT