File size: 2,378 Bytes
929e325 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ---
license: mit
library_name: pytorch
pipeline_tag: text-classification
tags:
- security
- cve
- vulnerability-management
- exploit-prediction
- sbom
---
# CVE Exploitability Model
Predicts the probability that a CVE is **exploited in the wild** (CISA KEV as the positive label),
for prioritising vulnerabilities surfaced from an SBOM. Two-branch deep model: a **TextCNN** over the
CVE description fused with an **MLP** over CVSS v3 metadata + CWE.
## Held-out test metrics
| Model | ROC-AUC | PR-AUC | Precision@10% | Recall@10% |
|---|---|---|---|---|
| Deep (TextCNN + CVSS fusion) | 0.934 | 0.763 | 0.782 | 0.473 |
| Baseline: CVSS score only | 0.746 | 0.316 | 0.347 | 0.210 |
| Baseline: logistic reg (structured) | 0.819 | 0.440 | 0.463 | 0.280 |
## Usage
```python
# pip install huggingface_hub torch numpy pandas
import importlib.util, sys
from huggingface_hub import hf_hub_download
spec = importlib.util.spec_from_file_location("hf_model", hf_hub_download("sumitp76/cve-exploitability", "hf_model.py"))
m = importlib.util.module_from_spec(spec); sys.modules["hf_model"] = m; spec.loader.exec_module(m)
pp, net = m.load_model("sumitp76/cve-exploitability")
import pandas as pd, torch
df = pd.DataFrame([{"description": "Remote code execution via crafted request ...",
"base_score": 9.8, "severity": "CRITICAL", "AV": "NETWORK", "AC": "LOW",
"PR": "NONE", "UI": "NONE", "S": "UNCHANGED", "C": "HIGH", "I": "HIGH",
"A": "HIGH", "has_v3": 1, "v2_base_score": None, "year": 2024, "cwe": "CWE-94"}])
Xt = torch.tensor(pp.transform_text(df.description.tolist()))
Xs = torch.tensor(pp.transform_struct(df), dtype=torch.float32)
print("exploit probability:", torch.sigmoid(net(Xt, Xs)).item())
```
## Labels
- `1` = listed in CISA KEV (known exploited in the wild)
- `0` = not known-exploited (sampled from NVD)
## Data sources
- **Labels:** CISA KEV (`cisagov/kev-data`)
- **Features:** NVD (`fkie-cad/nvd-json-data-feeds`) — description, CVSS v3, CWE
## Limitations
KEV is a weak/incomplete label; the training class prevalence is inflated versus the real-world
(<1%); CVEs seen during training score optimistically; novel patterns (e.g. supply-chain backdoors)
are hard. Evaluate with a time-based split before operational use. This model assists triage and is
not a substitute for human judgement.
|