Instructions to use bharathjanumpally/phi-span-detector-deberta-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bharathjanumpally/phi-span-detector-deberta-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="bharathjanumpally/phi-span-detector-deberta-v3")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("bharathjanumpally/phi-span-detector-deberta-v3") model = AutoModelForTokenClassification.from_pretrained("bharathjanumpally/phi-span-detector-deberta-v3") - Notebooks
- Google Colab
- Kaggle
PHI Span Detector (BIO NER) - Synthetic
phi-span-detector-deberta-v3 is a DeBERTa v3 token-classification model for detecting Protected Health Information (PHI) spans in clinical-note-like text and log-like text using BIO tagging.
It is designed for privacy tooling workflows such as:
- deterministic redaction pipelines
- pre-log and post-log PHI guardrails
- research prototypes for de-identification
Recommended pipeline:
- detect PHI spans
- apply deterministic redaction
- run a secondary leak-check gate before downstream use
Companion model:
bharathjanumpally/phi-leak-checker-deberta-v3
Model at a glance
- Task: token classification
- Architecture:
DebertaV2ForTokenClassification - Base model:
microsoft/deberta-v3-base - Max sequence length: 512
- Labeling scheme: BIO
- Training data: synthetic text only
PHI label set
The model predicts the following entity families:
| Label | Meaning |
|---|---|
NAME |
patient or person names |
DATE |
visit dates, birth dates, service dates |
AGE |
age mentions that may be identifying in context |
PHONE |
phone and callback numbers |
EMAIL |
email addresses |
ADDRESS |
street or mailing addresses |
ID |
MRN, account, encounter, record, or similar identifiers |
PROVIDER |
clinician or provider names |
FACILITY |
hospitals, clinics, centers, departments |
LOCATION |
city, state, and other place references |
Token-level outputs use BIO labels from the model config:
O, B-*, and I-* across the ten PHI families above.
How the training data was built
This model was trained on synthetic examples to keep the project openly shareable.
High-level training recipe:
- Generate synthetic clinical notes and log-like text with templates.
- Insert PHI-like fields such as names, dates, IDs, facilities, phone numbers, and addresses.
- Convert gold character spans into BIO token labels for token classification.
This provides clean supervision without exposing real patient data, but it also means real-world formatting and writing styles may differ from training-time distributions.
Evaluation
The repository includes a full seqeval_report.txt. Key held-out results from that report are summarized below.
Overall metrics
| Metric | Value |
|---|---|
| Micro precision | 0.6657 |
| Micro recall | 0.6394 |
| Micro F1 | 0.6523 |
| Macro precision | 0.6583 |
| Macro recall | 0.6224 |
| Macro F1 | 0.6362 |
| Weighted F1 | 0.6495 |
Per-label metrics
| Label | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| ADDRESS | 0.6652 | 0.6481 | 0.6565 | 233 |
| AGE | 0.6758 | 0.3834 | 0.4893 | 386 |
| DATE | 0.6553 | 0.6492 | 0.6522 | 1297 |
| 0.6474 | 0.6455 | 0.6465 | 347 | |
| FACILITY | 0.6320 | 0.6494 | 0.6406 | 656 |
| ID | 0.6652 | 0.6519 | 0.6585 | 451 |
| LOCATION | 0.6600 | 0.6600 | 0.6600 | 350 |
| NAME | 0.7810 | 0.7802 | 0.7806 | 1001 |
| PHONE | 0.5358 | 0.5025 | 0.5186 | 595 |
| PROVIDER | 0.6652 | 0.6537 | 0.6594 | 231 |
Interpretation
- Strongest label in the current report:
NAME - Weakest labels in the current report:
PHONEandAGE - The model is usable as a PHI span detector for research and tooling, but it should be paired with deterministic rules and internal evaluation before higher-stakes deployment
Intended use
Appropriate uses:
- PHI span detection in research prototypes
- de-identification pipelines when paired with deterministic redaction
- zero-trust logging guardrails
- preprocessing before a secondary PHI leak checker
Not intended for:
- medical diagnosis or treatment advice
- sole control for HIPAA, GDPR, or other compliance decisions
- unsupervised high-stakes production usage without internal validation
Limitations and failure modes
- The model was trained on synthetic text, so real clinical documentation may include unseen abbreviations, formatting quirks, shorthand, OCR noise, and edge cases.
- Numeric strings may be over-flagged when they resemble IDs, dates, or phone numbers.
- Some rare PHI patterns may be missed if they were not well represented in the synthetic templates.
- Partial tokens and tokenizer boundary effects can require careful post-processing in downstream systems.
- Label performance is uneven; current metrics suggest extra caution around
PHONEandAGE.
Recommended mitigations:
- add regex backstops for structured entities like email, phone, and date
- apply deterministic placeholder redaction after detection
- run a second PHI leak-check model before downstream release
- evaluate on an internal, policy-approved test set that matches your real document style
- keep a human-review path for ambiguous or high-risk content
Usage
Transformers pipeline
from transformers import pipeline
ner = pipeline(
"token-classification",
model="bharathjanumpally/phi-span-detector-deberta-v3",
aggregation_strategy="simple",
)
text = (
"Patient John Smith (MRN: 001-23-4567) visited "
"Boston Medical Center on 12/19/2025."
)
print(ner(text))
AutoModel and AutoTokenizer
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
model_id = "bharathjanumpally/phi-span-detector-deberta-v3"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
ner = pipeline(
"token-classification",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple",
)
print(ner("Call Jane Doe at 617-555-0182 before 04/14/2025."))
Deterministic redaction example
from transformers import pipeline
ner = pipeline(
"token-classification",
model="bharathjanumpally/phi-span-detector-deberta-v3",
aggregation_strategy="simple",
)
text = (
"Patient John Smith (MRN: 001-23-4567) visited "
"Boston Medical Center on 12/19/2025."
)
spans = ner(text)
redacted = text
for item in sorted(spans, key=lambda x: x["start"], reverse=True):
label = item["entity_group"]
redacted = redacted[: item["start"]] + f"[{label}]" + redacted[item["end"] :]
print(spans)
print(redacted)
Example output schema
For downstream systems, a practical span schema is:
[
{"start": 8, "end": 18, "label": "NAME", "score": 0.97},
{"start": 25, "end": 36, "label": "ID", "score": 0.94},
{"start": 45, "end": 66, "label": "FACILITY", "score": 0.91},
{"start": 70, "end": 80, "label": "DATE", "score": 0.89}
]
Safety and privacy
This model was trained on synthetic data and is published for research and tooling purposes. Do not send real PHI to public demos or public inference endpoints. Use private infrastructure, access controls, and organization-approved evaluation workflows for real deployments.
Citation
@misc{janumpally_phi_span_detector_2025,
title = {PHI Span Detector (Synthetic)},
author = {Bharath Kumar Reddy Janumpally},
year = {2025},
publisher = {Hugging Face},
howpublished = {Model on Hugging Face}
}
Contact
If you use this model in a serious workflow, validate it against your own internal test cases and document the operating policy around false positives, false negatives, and escalation paths.
- Downloads last month
- 13
Model tree for bharathjanumpally/phi-span-detector-deberta-v3
Base model
microsoft/deberta-v3-baseEvaluation results
- Micro F1 on Synthetic PHI span test setself-reported0.652
- Micro Precision on Synthetic PHI span test setself-reported0.666
- Micro Recall on Synthetic PHI span test setself-reported0.639
- Macro F1 on Synthetic PHI span test setself-reported0.636