| --- |
| license: mit |
| library_name: pytorch |
| pipeline_tag: text-classification |
| tags: |
| - security |
| - cve |
| - vulnerability-management |
| - exploit-prediction |
| - sbom |
| --- |
| |
| # CVE Exploitability Model |
|
|
| Predicts the probability that a CVE is **exploited in the wild** (CISA KEV as the positive label), |
| for prioritising vulnerabilities surfaced from an SBOM. Two-branch deep model: a **TextCNN** over the |
| CVE description fused with an **MLP** over CVSS v3 metadata + CWE. |
|
|
| ## Held-out test metrics |
| | Model | ROC-AUC | PR-AUC | Precision@10% | Recall@10% | |
| |---|---|---|---|---| |
| | Deep (TextCNN + CVSS fusion) | 0.934 | 0.763 | 0.782 | 0.473 | |
| | Baseline: CVSS score only | 0.746 | 0.316 | 0.347 | 0.210 | |
| | Baseline: logistic reg (structured) | 0.819 | 0.440 | 0.463 | 0.280 | |
|
|
| ## Usage |
| ```python |
| # pip install huggingface_hub torch numpy pandas |
| import importlib.util, sys |
| from huggingface_hub import hf_hub_download |
| spec = importlib.util.spec_from_file_location("hf_model", hf_hub_download("sumitp76/cve-exploitability", "hf_model.py")) |
| m = importlib.util.module_from_spec(spec); sys.modules["hf_model"] = m; spec.loader.exec_module(m) |
| |
| pp, net = m.load_model("sumitp76/cve-exploitability") |
| |
| import pandas as pd, torch |
| df = pd.DataFrame([{"description": "Remote code execution via crafted request ...", |
| "base_score": 9.8, "severity": "CRITICAL", "AV": "NETWORK", "AC": "LOW", |
| "PR": "NONE", "UI": "NONE", "S": "UNCHANGED", "C": "HIGH", "I": "HIGH", |
| "A": "HIGH", "has_v3": 1, "v2_base_score": None, "year": 2024, "cwe": "CWE-94"}]) |
| Xt = torch.tensor(pp.transform_text(df.description.tolist())) |
| Xs = torch.tensor(pp.transform_struct(df), dtype=torch.float32) |
| print("exploit probability:", torch.sigmoid(net(Xt, Xs)).item()) |
| ``` |
|
|
| ## Labels |
| - `1` = listed in CISA KEV (known exploited in the wild) |
| - `0` = not known-exploited (sampled from NVD) |
|
|
| ## Data sources |
| - **Labels:** CISA KEV (`cisagov/kev-data`) |
| - **Features:** NVD (`fkie-cad/nvd-json-data-feeds`) — description, CVSS v3, CWE |
|
|
| ## Limitations |
| KEV is a weak/incomplete label; the training class prevalence is inflated versus the real-world |
| (<1%); CVEs seen during training score optimistically; novel patterns (e.g. supply-chain backdoors) |
| are hard. Evaluate with a time-based split before operational use. This model assists triage and is |
| not a substitute for human judgement. |
|
|