ExposureGuard-PolicyNet

Stateful privacy policy selection for streaming multimodal clinical data.

Most de-identification models detect PHI in a single document then stop. This model watches cumulative exposure accumulate across modalities and time, and decides what action to take, not just what to mask.

What it does

Given a patient's current exposure state, it predicts the appropriate masking policy:

Policy When applied
raw Risk low, single modality, early in stream
weak Moderate risk, partial masking sufficient
pseudo High risk, pseudonymization required
redact Threshold crossed via exposure accumulation
adaptive_rewrite Threshold crossed via cross-modal linkage

The adaptive_rewrite class fires specifically when cross-modal co-occurrence causes risk to cross the threshold. This distinction is annotated in the benchmark dataset and scored by the DCPG scorer.

Input features (17-dim)

Feature Description
risk Current cumulative risk score
risk_before Risk before this event
delta_risk risk minus risk_before
eff_units_norm effective_units / 50
units_factor 1 minus exp(-0.05 times effective_units)
recency_factor 0.5^(age_s / half_life)
link_bonus 0.0 / 0.20 / 0.30 for 1 / 2 / 3+ modalities
degree_norm distinct modality count / 5
confidence same as units_factor
pseudo_ver_norm pseudonym version / 10
triggered 1.0 if threshold crossed
cm_count_norm cross-modal match count / 5
mod_* x5 one-hot over modalities

Usage

from huggingface_hub import hf_hub_download
import torch, torch.nn as nn

POLICIES   = ["raw", "weak", "pseudo", "redact", "adaptive_rewrite"]
MODALITIES = ["text", "asr", "image_proxy", "waveform_proxy", "audio_proxy"]
MOD2ID     = {m: i for i, m in enumerate(MODALITIES)}

class PolicyNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(17, 64),  nn.LayerNorm(64),  nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(64, 128), nn.LayerNorm(128), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(128, 64), nn.ReLU(),
            nn.Linear(64, 5),
        )
    def forward(self, x): return self.net(x)

weights = hf_hub_download("vkatg/exposureguard-policynet", "pytorch_model.bin")
model   = PolicyNet()
model.load_state_dict(torch.load(weights, map_location="cpu"))
model.eval()

state = {
    "risk": 0.72, "risk_before": 0.43, "effective_units": 9,
    "units_factor": 0.362, "recency_factor": 0.81, "link_bonus": 0.20,
    "degree": 2, "confidence": 0.362, "pseudonym_version": 1,
    "triggered": True, "cross_modal_matches": ["text"], "modality": "asr",
}

oh  = [0.0] * 5
oh[MOD2ID[state["modality"]]] = 1.0

features = [
    state["risk"], state["risk_before"],
    state["risk"] - state["risk_before"],
    min(state["effective_units"] / 50.0, 1.0),
    state["units_factor"], state["recency_factor"], state["link_bonus"],
    state["degree"] / 5.0, state["confidence"],
    state["pseudonym_version"] / 10.0,
    float(state["triggered"]),
    len(state["cross_modal_matches"]) / 5.0,
] + oh

x    = torch.tensor([features], dtype=torch.float32)
pred = model(x).argmax(1).item()
print(POLICIES[pred])

Architecture

Linear(17->64) -> LayerNorm -> ReLU -> Dropout(0.2)
Linear(64->128) -> LayerNorm -> ReLU -> Dropout(0.2)
Linear(128->64) -> ReLU
Linear(64->5)

Training

2,400 synthetic patient scenarios generated via the DCPG scorer. 85/15 train/val split, AdamW, cosine LR schedule, 60 epochs. Val accuracy: 1.0.

Related

Citation

@misc{exposureguard-policynet-2026,
  title  = {ExposureGuard-PolicyNet},
  author = {vkatg},
  year   = {2026},
  url    = {https://huggingface.co/vkatg/exposureguard-policynet}
}

Trained on fully synthetic data only. Not validated for clinical use.

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train vkatg/exposureguard-policynet

Spaces using vkatg/exposureguard-policynet 2

Evaluation results

  • accuracy on streaming-phi-deidentification-benchmark
    self-reported
    1.000