OpenMed PPSN v5
Token classification model derived from OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1 with added B-PPSN / I-PPSN support for Irish PPS Number detection.
What this release contains
- Full model weights (
model.safetensors) with original OpenMed labels + PPSN labels. label_meta.jsonwith label mapping and provenance.- Repro/eval artifacts:
irish_ppsn_regression_v5.jsonleval_manual_irish_v5_raw.jsoneval_hybrid_v5_no_plausible.jsonab_non_ppsn_v5_baseline.json
Key results
On irish_ppsn_regression_v5:
- Raw model-only PPSN performance:
P=0.7778 R=0.8750 F1=0.8235. - Recommended strict hybrid mode (model + checksum + span normalization, no plausible fallback):
P=1.0000 R=1.0000 F1=1.0000.
Non-PPSN retention check vs base OpenMed (A/B):
- Real-text behavior agreement F1:
1.0000. - Entity density unchanged in the sampled real-text proxy set.
Recommended production usage
For best PPSN reliability, run this model with a strict hybrid post-processor:
- Keep model detections for all labels.
- For PPSN specifically, normalize/expand spans to full PPSN candidates.
- Validate PPSN with checksum.
- Disable broad "plausible" fallback in production (
no-plausible-ppsn).
This repository includes only the model artifacts; hybrid scripts are in the companion codebase used to train/evaluate this release.
Limitations
- The Irish regression set is small and targeted; additional domain-specific validation is required before regulated deployment.
- PPSN detection in noisy OCR/ASR or heavily malformed text may require extra hardening.
License and attribution
- This derivative release is distributed under Apache-2.0, consistent with the base model license tag.
- Base model:
OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1. - Training/evaluation used synthetic and augmented data sources, including
nvidia/Nemotron-PIIandjoelniklaus/mapa. - See
NOTICEfor attribution details.
QA quick check
- Load this model as a standard
AutoModelForTokenClassificationcheckpoint. - Run against
irish_ppsn_regression_v5.jsonl. - Confirm results in
eval_manual_irish_v5_raw.json(raw) andeval_hybrid_v5_no_plausible.json(strict hybrid).
- Downloads last month
- 15
Model tree for temsa/OpenMed-PPSN-v5
Base model
microsoft/deberta-v3-largeDatasets used to train temsa/OpenMed-PPSN-v5
Evaluation results
- Raw model F1 on irish_ppsn_regression_v5self-reported0.824
- Hybrid strict F1 on irish_ppsn_regression_v5self-reported1.000