ExposureGuard-SynthRewrite-T5

PHI-aware clinical note rewriter with stateful pseudonym versioning.

Most de-identification tools find PHI spans and blank them out. This model rewrites the full note into safe, utility-preserving text while keeping pseudonyms consistent across events. When a cross-modal linkage trigger fires, pseudonyms rotate to a new version so prior and future records cannot be linked.

What it does

Input is a clinical note plus exposure context. Output is a rewritten note where real PHI is replaced with consistent fake identifiers that version-bump on trigger.

Input:  [risk=0.72] [modality=asr] [version=1] [trigger=cross_modal_linkage]
        note: James Smith is a 67-year-old patient...

Output: Robert Taylor is a 67-year-old patient presenting with chest pain.
        DOB [DOB-V1]. Contact: [PHONE-V1]. MRN: [MRN-V1].
        Attending: Dr. Chen at Memorial General.

The version suffix in the placeholders is the key behavior. Version 0 means no trigger has fired yet. Version 1 means pseudonyms rotated once due to cross-modal linkage. Any system joining records across versions will hit a deliberate discontinuity.

Usage

from transformers import T5ForConditionalGeneration, T5Tokenizer
import torch

tok   = T5Tokenizer.from_pretrained("vkatg/exposureguard-synthrewrite-t5")
model = T5ForConditionalGeneration.from_pretrained("vkatg/exposureguard-synthrewrite-t5")
model.eval()

note   = "James Smith, MRN482910, admitted with chest pain. DOB 03/22/1955."
prompt = (
    f"rewrite: [risk=0.72] [modality=asr] [version=1] "
    f"[trigger=cross_modal_linkage] note: {note}"
)

ids = tok(prompt, return_tensors="pt", max_length=256, truncation=True)
with torch.no_grad():
    out = model.generate(ids["input_ids"], max_new_tokens=200, num_beams=4)

print(tok.decode(out[0], skip_special_tokens=True))

Input format

rewrite: [risk=FLOAT] [modality=MODALITY] [version=INT] [trigger=REASON] note: NOTE_TEXT
Token Values
risk 0.0 to 1.0, current cumulative risk
modality text, asr, image_proxy, waveform_proxy, audio_proxy
version 0 = no trigger fired, 1+ = pseudonym rotation count
trigger none, cross_modal_linkage, exposure_accumulation

PHI handling

PHI type Replacement
Patient name Consistent pseudonym (version-aware)
DOB [DOB-V{version}]
MRN [MRN-V{version}]
Phone [PHONE-V{version}]
Address [ADDR-V{version}]

Training

Fine-tuned from t5-small on 2,000 synthetic clinical note pairs. Each pair was generated with injected fake PHI and a matching rewrite. Trigger scenarios (cross_modal_linkage, exposure_accumulation) were oversampled to ensure the model learns version-bump behavior.

Related

Citation

@misc{exposureguard-synthrewrite-2026,
  title  = {ExposureGuard-SynthRewrite-T5},
  author = {vkatg},
  year   = {2026},
  url    = {https://huggingface.co/vkatg/exposureguard-synthrewrite-t5}
}

Trained on fully synthetic data only. Not validated for clinical use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vkatg/exposureguard-synthrewrite-t5

Base model

google-t5/t5-small
Finetuned
(2244)
this model

Dataset used to train vkatg/exposureguard-synthrewrite-t5

Space using vkatg/exposureguard-synthrewrite-t5 1