ExposureGuard-PolicyNet
Stateful privacy policy selection for streaming multimodal clinical data.
Most de-identification models detect PHI in a single document then stop. This model watches cumulative exposure accumulate across modalities and time, and decides what action to take, not just what to mask.
What it does
Given a patient's current exposure state, it predicts the appropriate masking policy:
| Policy | When applied |
|---|---|
raw |
Risk low, single modality, early in stream |
weak |
Moderate risk, partial masking sufficient |
pseudo |
High risk, pseudonymization required |
redact |
Threshold crossed via exposure accumulation |
adaptive_rewrite |
Threshold crossed via cross-modal linkage |
The adaptive_rewrite class fires specifically when cross-modal co-occurrence causes risk to cross the threshold. This distinction is annotated in the benchmark dataset and scored by the DCPG scorer.
Input features (17-dim)
| Feature | Description |
|---|---|
risk |
Current cumulative risk score |
risk_before |
Risk before this event |
delta_risk |
risk minus risk_before |
eff_units_norm |
effective_units / 50 |
units_factor |
1 minus exp(-0.05 times effective_units) |
recency_factor |
0.5^(age_s / half_life) |
link_bonus |
0.0 / 0.20 / 0.30 for 1 / 2 / 3+ modalities |
degree_norm |
distinct modality count / 5 |
confidence |
same as units_factor |
pseudo_ver_norm |
pseudonym version / 10 |
triggered |
1.0 if threshold crossed |
cm_count_norm |
cross-modal match count / 5 |
mod_* x5 |
one-hot over modalities |
Usage
from huggingface_hub import hf_hub_download
import torch, torch.nn as nn
POLICIES = ["raw", "weak", "pseudo", "redact", "adaptive_rewrite"]
MODALITIES = ["text", "asr", "image_proxy", "waveform_proxy", "audio_proxy"]
MOD2ID = {m: i for i, m in enumerate(MODALITIES)}
class PolicyNet(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(
nn.Linear(17, 64), nn.LayerNorm(64), nn.ReLU(), nn.Dropout(0.2),
nn.Linear(64, 128), nn.LayerNorm(128), nn.ReLU(), nn.Dropout(0.2),
nn.Linear(128, 64), nn.ReLU(),
nn.Linear(64, 5),
)
def forward(self, x): return self.net(x)
weights = hf_hub_download("vkatg/exposureguard-policynet", "pytorch_model.bin")
model = PolicyNet()
model.load_state_dict(torch.load(weights, map_location="cpu"))
model.eval()
state = {
"risk": 0.72, "risk_before": 0.43, "effective_units": 9,
"units_factor": 0.362, "recency_factor": 0.81, "link_bonus": 0.20,
"degree": 2, "confidence": 0.362, "pseudonym_version": 1,
"triggered": True, "cross_modal_matches": ["text"], "modality": "asr",
}
oh = [0.0] * 5
oh[MOD2ID[state["modality"]]] = 1.0
features = [
state["risk"], state["risk_before"],
state["risk"] - state["risk_before"],
min(state["effective_units"] / 50.0, 1.0),
state["units_factor"], state["recency_factor"], state["link_bonus"],
state["degree"] / 5.0, state["confidence"],
state["pseudonym_version"] / 10.0,
float(state["triggered"]),
len(state["cross_modal_matches"]) / 5.0,
] + oh
x = torch.tensor([features], dtype=torch.float32)
pred = model(x).argmax(1).item()
print(POLICIES[pred])
Architecture
Linear(17->64) -> LayerNorm -> ReLU -> Dropout(0.2)
Linear(64->128) -> LayerNorm -> ReLU -> Dropout(0.2)
Linear(128->64) -> ReLU
Linear(64->5)
Training
2,400 synthetic patient scenarios generated via the DCPG scorer. 85/15 train/val split, AdamW, cosine LR schedule, 60 epochs. Val accuracy: 1.0.
Related
| DCPG Risk Scorer | vkatg/dcpg-cross-modal-phi-risk-scorer |
| Benchmark Dataset | vkatg/streaming-phi-deidentification-benchmark |
| Live Demo | vkatg/amphi-rl-dpgraph |
| GitHub | phi-exposure-guard |
Citation
@misc{exposureguard-policynet-2026,
title = {ExposureGuard-PolicyNet},
author = {vkatg},
year = {2026},
url = {https://huggingface.co/vkatg/exposureguard-policynet}
}
Trained on fully synthetic data only. Not validated for clinical use.
- Downloads last month
- 14
Dataset used to train vkatg/exposureguard-policynet
Spaces using vkatg/exposureguard-policynet 2
Evaluation results
- accuracy on streaming-phi-deidentification-benchmarkself-reported1.000