ExposureGuard-DAGPlanner

Knowing the risk score is not the same as knowing what to do about it. Most systems stop at the score. This one continues: given a patient's current re-identification risk, a set of active modalities, and a compute budget, DAGPlanner outputs an ordered remediation sequence with dependency constraints, priority weights, and estimated residual risk after each action executes.

It sits at the end of the ExposureGuard pipeline. Everything upstream produces a risk number. DAGPlanner turns that number into a plan.

The problem with flat action lists

PHI remediation actions have dependencies. You cannot retokenize text before masking direct identifiers, because retokenization relies on pseudonym assignments that happen during masking. You cannot apply federated noise before cross-modal unlinking, because noise added to a still-linked graph leaks through the linkage. Executing actions in the wrong order wastes compute or, worse, leaves re-identification pathways open that should have been closed.

DAGPlanner enforces a full set of precedence constraints across 14 remediation operations and sorts actions topologically before returning them. The plan is executable in the order returned. No additional dependency resolution required downstream.

How it plans

Three steps. First, each applicable action is scored by ROI times urgency:

score = (reduces * modality_weight + retok_boost) / base_cost * log1p(risk * 5)

The retok_boost fires when retok_prob > 0.55, lifting retokenize_text priority when the upstream FedCRDT-Distill model flagged a retokenization trigger. Second, actions are selected greedily by score until the target risk reduction is reached or the budget is exhausted. Missing dependencies are injected automatically. Third, a depth-first topological sort produces the final execution order, highest-priority actions first among siblings.

Usage

from dagplanner import ExposureGuardDAGPlanner

planner = ExposureGuardDAGPlanner(risk_threshold=0.20, budget=1.0)

result = planner.plan(
    risk_score=0.74,
    retok_prob=0.68,
    active_modalities=["text", "image", "ehr"],
    patient_id="P-0042",
)

print(result["plan"])
# [
#   {"action": "mask_direct_id",  "priority": 10.83, "cost": 0.05, "risk_delta": 0.35, "deps": []},
#   {"action": "generalize_dob",  "priority": 6.19,  "cost": 0.03, "risk_delta": 0.12, "deps": []},
#   {"action": "retokenize_text", "priority": 4.47,  "cost": 0.09, "risk_delta": 0.11, "deps": ["mask_direct_id"]}
# ]
print(result["estimated_residual_risk"])  # 0.16
print(result["status"])                   # plan_ready

Batch planning across multiple patients, sorted by risk descending:

records = [
    {"patient_id": "P-001", "risk_score": 0.82, "retok_prob": 0.71,
     "active_modalities": ["text", "image", "audio", "ehr"]},
    {"patient_id": "P-002", "risk_score": 0.45, "retok_prob": 0.30,
     "active_modalities": ["text", "ehr"]},
]
results = planner.batch_plan(records)

Plugging in directly from FedCRDT-Distill and DCPG Encoder:

merged_risk = 0.5 * fedcrdt_output["risk_score"] + 0.5 * encoder_output["risk_score"]

result = planner.plan(
    risk_score=merged_risk,
    retok_prob=fedcrdt_output["retok_prob"],
    active_modalities=["text", "image", "ehr"],
    patient_id=fedcrdt_output["patient_id"],
)

Custom modality weights when imaging carries higher regulatory risk:

planner = ExposureGuardDAGPlanner(
    risk_threshold=0.20,
    budget=1.2,
    modality_weights={"image": 1.4, "audio": 1.2},
)

Output schema

{
  "patient_id": "P-0042",
  "risk_score": 0.74,
  "retok_prob": 0.68,
  "plan": [
    {
      "action": "mask_direct_id",
      "priority": 10.8329,
      "cost": 0.05,
      "risk_delta": 0.35,
      "deps": [],
      "injected": false
    }
  ],
  "estimated_residual_risk": 0.16,
  "selected_cost": 0.17,
  "total_cost": 0.17,
  "status": "plan_ready"
}

Field	Type	Description
`plan`	list	Actions in topological execution order
`estimated_residual_risk`	float	Predicted post-remediation risk
`total_cost`	float	Sum of action base_cost values
`status`	str	`below_threshold` / `plan_ready` / `partial_plan` / `no_actions_applicable`

Action catalog

Action	Modalities	Cost	Risk reduction
`mask_direct_id`	text, image	0.05	0.35
`cross_modal_unlink`	all	0.18	0.28
`k_anon_table`	ehr, text	0.20	0.22
`redact_image_face`	image	0.12	0.20
`strip_audio_voice`	audio	0.15	0.18
`federated_noise`	all	0.25	0.15
`redact_image_tag`	image	0.08	0.14
`anon_audio_content`	audio	0.10	0.12
`generalize_dob`	text, ehr	0.03	0.12
`retokenize_text`	text	0.09	0.11
`suppress_geo`	text, ehr	0.04	0.10
`drop_rare_code`	ehr	0.06	0.09
`perturb_numerics`	ehr	0.07	0.07
`audit_log_purge`	all	0.02	0.03

Dependency rules

mask_direct_id      -> retokenize_text
mask_direct_id      -> cross_modal_unlink
redact_image_face   -> cross_modal_unlink
redact_image_face   -> redact_image_tag
strip_audio_voice   -> cross_modal_unlink
strip_audio_voice   -> anon_audio_content
generalize_dob      -> k_anon_table
suppress_geo        -> k_anon_table
drop_rare_code      -> k_anon_table
k_anon_table        -> federated_noise
cross_modal_unlink  -> federated_noise
federated_noise     -> audit_log_purge

Where it fits

Clinical Record
    |
    v
DCPG Risk Scorer -> DCPG Encoder -> FedCRDT-Distill
                                         |
                                         v
                                    DAGPlanner        <- this model
                                         |
                                         v
                               Remediation Action Plan
                                         |
                                         v
                                   SynthRewrite-T5

phi-exposure-guard: full system with all components
exposureguard-fedcrdt-distill: provides risk_score and retok_prob inputs
exposureguard-dcpg-encoder: provides patient embedding and risk score
exposureguard-policynet: policy enforcement layer
exposureguard-synthrewrite-t5: downstream synthetic rewriting
dag-remediation-traces: 7,500 input/output traces for this planner
multimodal-phi-masking-benchmark: 10,000 records across 5 modalities

Citation

@software{exposureguard_dagplanner,
  title  = {ExposureGuard-DAGPlanner: Dependency-Aware PHI Remediation Planning},
  author = {Ganti, Venkata Krishna Azith Teja},
  doi    = {10.5281/zenodo.18865882},
  url    = {https://huggingface.co/vkatg/exposureguard-dagplanner},
  note   = {US Provisional Patent filed 2025-07-05}
}

MIT License. Research code. Not a production compliance system.

Downloads last month: 3

vkatg
/

exposureguard-dagplanner