OpenMed-PPSN-v4
Model Summary
This model extends OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1 with Irish PPSN detection (B-PPSN / I-PPSN) while preserving baseline behavior on non-PPSN entities.
Intended Use
- Clinical and administrative de-identification pipelines.
- Hybrid usage is recommended: model detection + deterministic PPSN checksum validation.
Out-of-Scope
- Not a legal/compliance determination system.
- Not validated for every institution, note format, OCR stream, or jurisdiction.
Training Data
- Synthetic PPSN span-labeled data generated with the official checksum rules.
- Additional benchmarking reference:
nvidia/Nemotron-PII(CC-BY-4.0).
Evaluation Snapshot
- Synthetic non-PPSN F1 (base):
0.8701 - Synthetic non-PPSN F1 (candidate):
0.8420 - Candidate-vs-base agreement F1 on real-text proxy set:
0.9814 - Entities/1k chars (base):
11.4124 - Entities/1k chars (candidate):
11.1231
Usage
Load with transformers token classification pipeline, or integrate with a PPSN checksum validator in your masking service.
Limitations
- Real-text drift check used a proxy general-text set.
- Production deployment requires local validation on your own data.
Licensing and Attribution
- Base model is tagged Apache-2.0.
- This release is Apache-2.0.
- Dataset attribution for CC-BY-4.0 sources is preserved in
NOTICE.
- Downloads last month
- 27
Model tree for temsa/OpenMed-PPSN-v4
Base model
microsoft/deberta-v3-large