YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

NER Cybersecurity Model v2

Extract security skills, certs, and threats from CVs and job posts in milliseconds.

Model Details

  • Base Model: en_core_web_sm (spaCy 2.2.5)
  • Training: Prodigy 1.9.9 with 1805 annotated examples
  • Accuracy: 99.5% on evaluation set
  • Version: 2.0 (improved coverage)

Improvements over v1

  • +24.5% F1 (27.5% โ†’ 52.0% on test set)
  • CVE recognition now working (80% F1, was 0%)
  • ACRONYM recognition added (66.7% F1, was 0%)
  • CISO correctly tagged as SECURITY_ROLE (was CERTIFICATION)
  • 805 new training examples covering previously missing types

Entity Types (13)

Category Entity Types
Core Security SECURITY_ROLE, SECURITY_TOOL, CERTIFICATION, FRAMEWORK
Threats CVE, THREAT_TYPE, ATTACK_TECHNIQUE
Skills TECHNICAL_SKILL, ACRONYM, SECURITY_DOMAIN
Compliance REGULATION, CONTROL_ID, AUDIT_TERM

Usage

import spacy

nlp = spacy.load("pki/ner-cybersecurity")

doc = nlp("CISO with CISSP. Patched CVE-2024-1234. Expert in SIEM and EDR.")
for ent in doc.ents:
    print(f"{ent.label_}: {ent.text}")

Output:

SECURITY_ROLE: CISO
CERTIFICATION: CISSP
CVE: CVE-2024-1234
SECURITY_TOOL: SIEM
ACRONYM: EDR

Training Data Sources

  • pki-ad-match role YAMLs (62 roles)
  • agent-nexus-85 agent presets (90 agents)
  • Synthetic examples with cybersecurity entities
  • 805 new examples for underrepresented types

Requirements

  • spaCy 2.2.x
  • Python 3.6-3.8

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support