Stage 3b β€” CVE Risk & Embedding

What this model does

Takes CVE IDs and produces:

  • z_cve ∈ ℝ^128 β€” dense vulnerability embedding (e5-large-v2 backbone)
  • risk_score ∈ [0, 10] β€” CVSS-aligned risk score
  • exploit_prob ∈ [0, 1] β€” exploit probability

Architecture

  • Backbone: intfloat/e5-large-v2 (frozen, 1024β†’128 projection)
  • Risk head: MLP regression (128β†’64β†’32β†’2 outputs)
  • Training: MSE loss on CVSS scores + exploit probabilities from Stage 0a

Output files

File Description Shape
z_cve.parquet CVE embedding vectors (6, 129)
risk_scores.parquet risk_score + exploit_prob (6, 3)
stage3b_embeddings.parquet Combined β€” Stage 5 input (6, 131)
stage3b_best.pt Model checkpoint β€”

CVEs covered

CVE-2024-1234, CVE-2024-5678, CVE-2024-9999, CVE-2024-3333, CVE-2024-7777, CVE-2024-2222

Usage

from huggingface_hub import hf_hub_download
import pandas as pd

df = pd.read_parquet(
    hf_hub_download(repo_id="sohomn/stage3b-cve-risk-and-embeddings", filename="stage3b_embeddings.parquet")
)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support