ExposureGuard-DAGPlanner

DOI

Knowing the risk score is not the same as knowing what to do about it. Most systems stop at the score. This one continues: given a patient's current re-identification risk, a set of active modalities, and a compute budget, DAGPlanner outputs an ordered remediation sequence with dependency constraints, priority weights, and estimated residual risk after each action executes.

It sits at the end of the ExposureGuard pipeline. Everything upstream produces a risk number. DAGPlanner turns that number into a plan.


The problem with flat action lists

PHI remediation actions have dependencies. You cannot retokenize text before masking direct identifiers, because retokenization relies on pseudonym assignments that happen during masking. You cannot apply federated noise before cross-modal unlinking, because noise added to a still-linked graph leaks through the linkage. Executing actions in the wrong order wastes compute or, worse, leaves re-identification pathways open that should have been closed.

DAGPlanner enforces a full set of precedence constraints across 14 remediation operations and sorts actions topologically before returning them. The plan is executable in the order returned. No additional dependency resolution required downstream.


How it plans

Three steps. First, each applicable action is scored by ROI times urgency:

score = (reduces * modality_weight + retok_boost) / base_cost * log1p(risk * 5)

The retok_boost fires when retok_prob > 0.55, lifting retokenize_text priority when the upstream FedCRDT-Distill model flagged a retokenization trigger. Second, actions are selected greedily by score until the target risk reduction is reached or the budget is exhausted. Missing dependencies are injected automatically. Third, a depth-first topological sort produces the final execution order, highest-priority actions first among siblings.


Usage

from dagplanner import ExposureGuardDAGPlanner

planner = ExposureGuardDAGPlanner(risk_threshold=0.20, budget=1.0)

result = planner.plan(
    risk_score=0.74,
    retok_prob=0.68,
    active_modalities=["text", "image", "ehr"],
    patient_id="P-0042",
)

print(result["plan"])
# [
#   {"action": "mask_direct_id",  "priority": 10.83, "cost": 0.05, "risk_delta": 0.35, "deps": []},
#   {"action": "generalize_dob",  "priority": 6.19,  "cost": 0.03, "risk_delta": 0.12, "deps": []},
#   {"action": "retokenize_text", "priority": 4.47,  "cost": 0.09, "risk_delta": 0.11, "deps": ["mask_direct_id"]}
# ]
print(result["estimated_residual_risk"])  # 0.16
print(result["status"])                   # plan_ready

Batch planning across multiple patients, sorted by risk descending:

records = [
    {"patient_id": "P-001", "risk_score": 0.82, "retok_prob": 0.71,
     "active_modalities": ["text", "image", "audio", "ehr"]},
    {"patient_id": "P-002", "risk_score": 0.45, "retok_prob": 0.30,
     "active_modalities": ["text", "ehr"]},
]
results = planner.batch_plan(records)

Plugging in directly from FedCRDT-Distill and DCPG Encoder:

merged_risk = 0.5 * fedcrdt_output["risk_score"] + 0.5 * encoder_output["risk_score"]

result = planner.plan(
    risk_score=merged_risk,
    retok_prob=fedcrdt_output["retok_prob"],
    active_modalities=["text", "image", "ehr"],
    patient_id=fedcrdt_output["patient_id"],
)

Custom modality weights when imaging carries higher regulatory risk:

planner = ExposureGuardDAGPlanner(
    risk_threshold=0.20,
    budget=1.2,
    modality_weights={"image": 1.4, "audio": 1.2},
)

Output schema

{
  "patient_id": "P-0042",
  "risk_score": 0.74,
  "retok_prob": 0.68,
  "plan": [
    {
      "action": "mask_direct_id",
      "priority": 10.8329,
      "cost": 0.05,
      "risk_delta": 0.35,
      "deps": [],
      "injected": false
    }
  ],
  "estimated_residual_risk": 0.16,
  "selected_cost": 0.17,
  "total_cost": 0.17,
  "status": "plan_ready"
}
Field Type Description
plan list Actions in topological execution order
estimated_residual_risk float Predicted post-remediation risk
total_cost float Sum of action base_cost values
status str below_threshold / plan_ready / partial_plan / no_actions_applicable

Action catalog

Action Modalities Cost Risk reduction
mask_direct_id text, image 0.05 0.35
cross_modal_unlink all 0.18 0.28
k_anon_table ehr, text 0.20 0.22
redact_image_face image 0.12 0.20
strip_audio_voice audio 0.15 0.18
federated_noise all 0.25 0.15
redact_image_tag image 0.08 0.14
anon_audio_content audio 0.10 0.12
generalize_dob text, ehr 0.03 0.12
retokenize_text text 0.09 0.11
suppress_geo text, ehr 0.04 0.10
drop_rare_code ehr 0.06 0.09
perturb_numerics ehr 0.07 0.07
audit_log_purge all 0.02 0.03

Dependency rules

mask_direct_id      -> retokenize_text
mask_direct_id      -> cross_modal_unlink
redact_image_face   -> cross_modal_unlink
redact_image_face   -> redact_image_tag
strip_audio_voice   -> cross_modal_unlink
strip_audio_voice   -> anon_audio_content
generalize_dob      -> k_anon_table
suppress_geo        -> k_anon_table
drop_rare_code      -> k_anon_table
k_anon_table        -> federated_noise
cross_modal_unlink  -> federated_noise
federated_noise     -> audit_log_purge

Where it fits

Clinical Record
    |
    v
DCPG Risk Scorer -> DCPG Encoder -> FedCRDT-Distill
                                         |
                                         v
                                    DAGPlanner        <- this model
                                         |
                                         v
                               Remediation Action Plan
                                         |
                                         v
                                   SynthRewrite-T5

Related


Citation

@software{exposureguard_dagplanner,
  title  = {ExposureGuard-DAGPlanner: Dependency-Aware PHI Remediation Planning},
  author = {Ganti, Venkata Krishna Azith Teja},
  doi    = {10.5281/zenodo.18865882},
  url    = {https://huggingface.co/vkatg/exposureguard-dagplanner},
  note   = {US Provisional Patent filed 2025-07-05}
}

MIT License. Research code. Not a production compliance system.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train vkatg/exposureguard-dagplanner