ExposureGuard-PolicyNet

Most PHI classifiers answer one question: is this PHI or not? This model answers a different question: given everything we know about this patient's exposure history so far, what should we do right now?

Five possible answers. One output per event.

What it does

Takes a 17-dimensional exposure state vector and predicts the appropriate masking policy for the current event. The state captures cumulative risk, cross-modal linkage signals, pseudonym versioning, and modality context.

Policy	When it fires
`raw`	Risk low, single modality, early in stream
`weak`	Moderate risk, partial masking sufficient
`pseudo`	Risk above 0.65, pseudonymization required
`redact`	Threshold crossed via exposure accumulation
`adaptive_rewrite`	Threshold crossed via cross-modal linkage

The adaptive_rewrite vs redact distinction is the key contribution. Both fire when risk crosses the threshold, but the cause matters. Cross-modal linkage means the same patient has appeared across two or more modalities and the records have been linked. That scenario calls for a full synthetic rewrite downstream via SynthRewrite-T5, not just redaction. Exposure accumulation, where risk built up within a single modality stream, calls for redact.

Usage

from inference import predict

result = predict({
    "risk": 0.72,
    "risk_before": 0.43,
    "effective_units": 9,
    "units_factor": 0.362,
    "recency_factor": 0.81,
    "link_bonus": 0.20,
    "degree": 2,
    "confidence": 0.362,
    "pseudonym_version": 1,
    "triggered": True,
    "cross_modal_matches": ["text"],
    "modality": "asr",
})

print(result["policy"])      # adaptive_rewrite
print(result["confidence"])  # float
print(result["all_scores"])  # scores for all 5 policies

Loading directly from HuggingFace:

from huggingface_hub import hf_hub_download
import torch
import torch.nn as nn

class PolicyNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(17, 64),  nn.LayerNorm(64),  nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(64, 128), nn.LayerNorm(128), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(128, 64), nn.ReLU(),
            nn.Linear(64, 5),
        )
    def forward(self, x):
        return self.net(x)

weights = hf_hub_download("vkatg/exposureguard-policynet", "pytorch_model.bin")
model   = PolicyNet()
model.load_state_dict(torch.load(weights, map_location="cpu", weights_only=True))
model.eval()

Input features (17 dimensions)

Feature	Description
`risk`	Current cumulative risk score
`risk_before`	Risk before this event
`delta_risk`	risk minus risk_before
`eff_units_norm`	effective_units / 50
`units_factor`	1 - exp(-0.05 * effective_units)
`recency_factor`	0.5^(age_seconds / half_life)
`link_bonus`	0.0 / 0.20 / 0.30 for 1 / 2 / 3+ linked modalities
`degree_norm`	distinct modality count / 5
`confidence`	same as units_factor
`pseudo_ver_norm`	pseudonym version / 10
`triggered`	1.0 if threshold crossed this event
`cm_count_norm`	cross-modal match count / 5
`mod_text` ... `mod_audio_proxy`	modality one-hot (5 dims)

Architecture

Linear(17->64) -> LayerNorm -> ReLU -> Dropout(0.2)
Linear(64->128) -> LayerNorm -> ReLU -> Dropout(0.2)
Linear(128->64) -> ReLU
Linear(64->5)

75KB weights file. No external dependencies beyond PyTorch.

Training

6,000 synthetic patient scenarios, balanced at 1,200 samples per policy class. 85/15 train/val split, AdamW, cosine LR schedule, 60 epochs. Val accuracy: 99.89%. Per-class results: raw 100%, weak 99.7%, pseudo 99.8%, redact 100%, adaptive_rewrite 100%. The small number of weak/pseudo misclassifications comes from the adjacent risk ranges (0.35-0.64 and 0.65-0.84) where boundary cases are genuinely ambiguous.

Where it fits

DCPG Risk Scorer
      |
  ExposureGuard-PolicyNet    <- this model
      |
  +---+-------------------+
  |                       |
adaptive_rewrite        redact / pseudo / weak / raw
  |
SynthRewrite-T5

phi-exposure-guard: full system
dcpg-cross-modal-phi-risk-scorer: produces the risk score and trigger inputs
exposureguard-dcpg-encoder: graph encoder upstream
exposureguard-fedcrdt-distill: federated risk scoring
exposureguard-synthrewrite-t5: downstream rewriter for adaptive_rewrite decisions
exposureguard-dagplanner: remediation planner
streaming-phi-deidentification-benchmark: benchmark dataset
multimodal-phi-masking-benchmark: PHI masking dataset

Citation

@software{exposureguard_policynet,
  title  = {ExposureGuard-PolicyNet: Stateful Privacy Policy Selection for Streaming Multimodal Clinical Data},
  author = {Ganti, Venkata Krishna Azith Teja},
  doi    = {10.5281/zenodo.18865882},
  url    = {https://huggingface.co/vkatg/exposureguard-policynet},
  note   = {US Provisional Patent filed 2025-07-05}
}

Trained on fully synthetic data. Not validated for clinical use.

Downloads last month: 4

Datasets used to train vkatg/exposureguard-policynet

Space using vkatg/exposureguard-policynet 1

Evaluation results

accuracy on streaming-phi-deidentification-benchmark
self-reported

0.999

vkatg
/

exposureguard-policynet