Vrandan/pii-harmonized-corpus-v2
Viewer • Updated • 295k • 7
How to use Vrandan/pii-modernbert-large-v2 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="Vrandan/pii-modernbert-large-v2") # Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("Vrandan/pii-modernbert-large-v2")
model = AutoModelForTokenClassification.from_pretrained("Vrandan/pii-modernbert-large-v2")PII / PHI named-entity recognition fine-tune of
answerdotai/ModernBERT-large on the
v2 harmonized + synthetic-augmented English-only corpus at
Vrandan/pii-harmonized-corpus-v2.
| Metric | Value |
|---|---|
| F1 | 0.5975 |
| Precision | 0.5341 |
| Recall | 0.6780 |
from transformers import pipeline
pii = pipeline(
"token-classification",
model="Vrandan/pii-modernbert-large-v2",
aggregation_strategy="simple",
)
pii("Patient John Smith, MRN-2024-88432, called 555-FAKE-1234 about Rx refill.")
This model is the ML half of a regex+ML hybrid. Inference output should be merged with the regex layer for the 3 dropped labels (HTTP_COOKIE, MAC_ADDRESS, BLOOD_TYPE) which the model is trained to NOT fire on.
Base model
answerdotai/ModernBERT-large