CVE exploitability model: weights + preprocessor + loader + card

929e325 verified 15 days ago

2.38 kB

	---
	license: mit
	library_name: pytorch
	pipeline_tag: text-classification
	tags:
	- security
	- cve
	- vulnerability-management
	- exploit-prediction
	- sbom
	---

	# CVE Exploitability Model

	Predicts the probability that a CVE is exploited in the wild (CISA KEV as the positive label),
	for prioritising vulnerabilities surfaced from an SBOM. Two-branch deep model: a TextCNN over the
	CVE description fused with an MLP over CVSS v3 metadata + CWE.

	## Held-out test metrics
	\| Model \| ROC-AUC \| PR-AUC \| Precision@10% \| Recall@10% \|
	\|---\|---\|---\|---\|---\|
	\| Deep (TextCNN + CVSS fusion) \| 0.934 \| 0.763 \| 0.782 \| 0.473 \|
	\| Baseline: CVSS score only \| 0.746 \| 0.316 \| 0.347 \| 0.210 \|
	\| Baseline: logistic reg (structured) \| 0.819 \| 0.440 \| 0.463 \| 0.280 \|

	## Usage
	```python
	# pip install huggingface_hub torch numpy pandas
	import importlib.util, sys
	from huggingface_hub import hf_hub_download
	spec = importlib.util.spec_from_file_location("hf_model", hf_hub_download("sumitp76/cve-exploitability", "hf_model.py"))
	m = importlib.util.module_from_spec(spec); sys.modules["hf_model"] = m; spec.loader.exec_module(m)

	pp, net = m.load_model("sumitp76/cve-exploitability")

	import pandas as pd, torch
	df = pd.DataFrame([{"description": "Remote code execution via crafted request ...",
	"base_score": 9.8, "severity": "CRITICAL", "AV": "NETWORK", "AC": "LOW",
	"PR": "NONE", "UI": "NONE", "S": "UNCHANGED", "C": "HIGH", "I": "HIGH",
	"A": "HIGH", "has_v3": 1, "v2_base_score": None, "year": 2024, "cwe": "CWE-94"}])
	Xt = torch.tensor(pp.transform_text(df.description.tolist()))
	Xs = torch.tensor(pp.transform_struct(df), dtype=torch.float32)
	print("exploit probability:", torch.sigmoid(net(Xt, Xs)).item())
	```

	## Labels
	- `1` = listed in CISA KEV (known exploited in the wild)
	- `0` = not known-exploited (sampled from NVD)

	## Data sources
	- Labels: CISA KEV (`cisagov/kev-data`)
	- Features: NVD (`fkie-cad/nvd-json-data-feeds`) — description, CVSS v3, CWE

	## Limitations
	KEV is a weak/incomplete label; the training class prevalence is inflated versus the real-world
	(<1%); CVEs seen during training score optimistically; novel patterns (e.g. supply-chain backdoors)
	are hard. Evaluate with a time-based split before operational use. This model assists triage and is
	not a substitute for human judgement.