Stage 3b — CVE Risk & Embedding

What this model does

Takes CVE IDs and produces:

z_cve ∈ ℝ^128 — dense vulnerability embedding (e5-large-v2 backbone)
risk_score ∈ [0, 10] — CVSS-aligned risk score
exploit_prob ∈ [0, 1] — exploit probability

Architecture

Backbone: intfloat/e5-large-v2 (frozen, 1024→128 projection)
Risk head: MLP regression (128→64→32→2 outputs)
Training: MSE loss on CVSS scores + exploit probabilities from Stage 0a

Output files

File	Description	Shape
`z_cve.parquet`	CVE embedding vectors	(6, 129)
`risk_scores.parquet`	risk_score + exploit_prob	(6, 3)
`stage3b_embeddings.parquet`	Combined — Stage 5 input	(6, 131)
`stage3b_best.pt`	Model checkpoint	—

CVEs covered

CVE-2024-1234, CVE-2024-5678, CVE-2024-9999, CVE-2024-3333, CVE-2024-7777, CVE-2024-2222

Usage

from huggingface_hub import hf_hub_download
import pandas as pd

df = pd.read_parquet(
    hf_hub_download(repo_id="sohomn/stage3b-cve-risk-and-embeddings", filename="stage3b_embeddings.parquet")
)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support