OpenMed PPSN v5.1
Token classification model derived from OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1 with B-PPSN / I-PPSN support for Irish PPSN detection.
Why v5.1
This iteration hardens PPSN behavior against known false positives on number-like strings (for example phone numbers and malformed ID-like tokens) while preserving non-PPSN behavior.
What this release contains
- Full model weights (
model.safetensors) with original OpenMed labels + PPSN labels. label_meta.jsonwith label mapping and provenance.- Eval artifacts:
eval_manual_irish_v5_1_large_v2_raw.jsoneval_hybrid_v5_1_large_v2_strict.jsonab_non_ppsn_v5_1.jsonqa_ppsn_regression_v6_validated.jsonleval_manual_qa_regression_v6_validated_v5_1_raw.jsoneval_hybrid_qa_regression_v6_validated_v5_1_strict.json
Key results
On irish_ppsn_eval_large_v2:
- Raw model-only PPSN performance:
P=0.8699 R=0.9956 F1=0.9285
- Recommended strict hybrid mode (
--no-plausible-ppsn,--ppsn-min-score 0.6):P=1.0000 R=0.9997 F1=0.9999
Non-PPSN retention vs base OpenMed (ab_non_ppsn_v5_1.json):
- Synthetic non-PPSN F1 delta vs base:
+0.00049 - Real-set agreement F1 (candidate vs base):
1.0000 - Real entities per 1k chars delta vs base:
0.0000
Recommended production usage
Use strict hybrid PPSN post-processing (checksum-backed) for production masking. Raw model-only PPSN spans are less reliable on malformed numeric strings.
License and attribution
- This derivative release is distributed under Apache-2.0, consistent with the base model license tag.
- Base model:
OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1. - See
NOTICEfor attribution details.
- Downloads last month
- 16
Model tree for temsa/OpenMed-PPSN-v5_1
Base model
microsoft/deberta-v3-largeDatasets used to train temsa/OpenMed-PPSN-v5_1
Evaluation results
- Raw model F1 on irish_ppsn_eval_large_v2self-reported0.928
- Hybrid strict F1 on irish_ppsn_eval_large_v2self-reported1.000