ExposureGuard-DAGPlanner
Knowing the risk score is not the same as knowing what to do about it. Most systems stop at the score. This one continues: given a patient's current re-identification risk, a set of active modalities, and a compute budget, DAGPlanner outputs an ordered remediation sequence with dependency constraints, priority weights, and estimated residual risk after each action executes.
It sits at the end of the ExposureGuard pipeline. Everything upstream produces a risk number. DAGPlanner turns that number into a plan.
The problem with flat action lists
PHI remediation actions have dependencies. You cannot retokenize text before masking direct identifiers, because retokenization relies on pseudonym assignments that happen during masking. You cannot apply federated noise before cross-modal unlinking, because noise added to a still-linked graph leaks through the linkage. Executing actions in the wrong order wastes compute or, worse, leaves re-identification pathways open that should have been closed.
DAGPlanner enforces a full set of precedence constraints across 14 remediation operations and sorts actions topologically before returning them. The plan is executable in the order returned. No additional dependency resolution required downstream.
How it plans
Three steps. First, each applicable action is scored by ROI times urgency:
score = (reduces * modality_weight + retok_boost) / base_cost * log1p(risk * 5)
The retok_boost fires when retok_prob > 0.55, lifting retokenize_text priority when the upstream FedCRDT-Distill model flagged a retokenization trigger. Second, actions are selected greedily by score until the target risk reduction is reached or the budget is exhausted. Missing dependencies are injected automatically. Third, a depth-first topological sort produces the final execution order, highest-priority actions first among siblings.
Usage
from dagplanner import ExposureGuardDAGPlanner
planner = ExposureGuardDAGPlanner(risk_threshold=0.20, budget=1.0)
result = planner.plan(
risk_score=0.74,
retok_prob=0.68,
active_modalities=["text", "image", "ehr"],
patient_id="P-0042",
)
print(result["plan"])
# [
# {"action": "mask_direct_id", "priority": 10.83, "cost": 0.05, "risk_delta": 0.35, "deps": []},
# {"action": "generalize_dob", "priority": 6.19, "cost": 0.03, "risk_delta": 0.12, "deps": []},
# {"action": "retokenize_text", "priority": 4.47, "cost": 0.09, "risk_delta": 0.11, "deps": ["mask_direct_id"]}
# ]
print(result["estimated_residual_risk"]) # 0.16
print(result["status"]) # plan_ready
Batch planning across multiple patients, sorted by risk descending:
records = [
{"patient_id": "P-001", "risk_score": 0.82, "retok_prob": 0.71,
"active_modalities": ["text", "image", "audio", "ehr"]},
{"patient_id": "P-002", "risk_score": 0.45, "retok_prob": 0.30,
"active_modalities": ["text", "ehr"]},
]
results = planner.batch_plan(records)
Plugging in directly from FedCRDT-Distill and DCPG Encoder:
merged_risk = 0.5 * fedcrdt_output["risk_score"] + 0.5 * encoder_output["risk_score"]
result = planner.plan(
risk_score=merged_risk,
retok_prob=fedcrdt_output["retok_prob"],
active_modalities=["text", "image", "ehr"],
patient_id=fedcrdt_output["patient_id"],
)
Custom modality weights when imaging carries higher regulatory risk:
planner = ExposureGuardDAGPlanner(
risk_threshold=0.20,
budget=1.2,
modality_weights={"image": 1.4, "audio": 1.2},
)
Output schema
{
"patient_id": "P-0042",
"risk_score": 0.74,
"retok_prob": 0.68,
"plan": [
{
"action": "mask_direct_id",
"priority": 10.8329,
"cost": 0.05,
"risk_delta": 0.35,
"deps": [],
"injected": false
}
],
"estimated_residual_risk": 0.16,
"selected_cost": 0.17,
"total_cost": 0.17,
"status": "plan_ready"
}
| Field | Type | Description |
|---|---|---|
plan |
list | Actions in topological execution order |
estimated_residual_risk |
float | Predicted post-remediation risk |
total_cost |
float | Sum of action base_cost values |
status |
str | below_threshold / plan_ready / partial_plan / no_actions_applicable |
Action catalog
| Action | Modalities | Cost | Risk reduction |
|---|---|---|---|
mask_direct_id |
text, image | 0.05 | 0.35 |
cross_modal_unlink |
all | 0.18 | 0.28 |
k_anon_table |
ehr, text | 0.20 | 0.22 |
redact_image_face |
image | 0.12 | 0.20 |
strip_audio_voice |
audio | 0.15 | 0.18 |
federated_noise |
all | 0.25 | 0.15 |
redact_image_tag |
image | 0.08 | 0.14 |
anon_audio_content |
audio | 0.10 | 0.12 |
generalize_dob |
text, ehr | 0.03 | 0.12 |
retokenize_text |
text | 0.09 | 0.11 |
suppress_geo |
text, ehr | 0.04 | 0.10 |
drop_rare_code |
ehr | 0.06 | 0.09 |
perturb_numerics |
ehr | 0.07 | 0.07 |
audit_log_purge |
all | 0.02 | 0.03 |
Dependency rules
mask_direct_id -> retokenize_text
mask_direct_id -> cross_modal_unlink
redact_image_face -> cross_modal_unlink
redact_image_face -> redact_image_tag
strip_audio_voice -> cross_modal_unlink
strip_audio_voice -> anon_audio_content
generalize_dob -> k_anon_table
suppress_geo -> k_anon_table
drop_rare_code -> k_anon_table
k_anon_table -> federated_noise
cross_modal_unlink -> federated_noise
federated_noise -> audit_log_purge
Where it fits
Clinical Record
|
v
DCPG Risk Scorer -> DCPG Encoder -> FedCRDT-Distill
|
v
DAGPlanner <- this model
|
v
Remediation Action Plan
|
v
SynthRewrite-T5
Related
- phi-exposure-guard: full system with all components
- exposureguard-fedcrdt-distill: provides risk_score and retok_prob inputs
- exposureguard-dcpg-encoder: provides patient embedding and risk score
- exposureguard-policynet: policy enforcement layer
- exposureguard-synthrewrite-t5: downstream synthetic rewriting
- dag-remediation-traces: 7,500 input/output traces for this planner
- multimodal-phi-masking-benchmark: 10,000 records across 5 modalities
Citation
@software{exposureguard_dagplanner,
title = {ExposureGuard-DAGPlanner: Dependency-Aware PHI Remediation Planning},
author = {Ganti, Venkata Krishna Azith Teja},
doi = {10.5281/zenodo.18865882},
url = {https://huggingface.co/vkatg/exposureguard-dagplanner},
note = {US Provisional Patent filed 2025-07-05}
}
MIT License. Research code. Not a production compliance system.