--- language: - en license: apache-2.0 library_name: transformers tags: - pii - phi - token-classification - healthcare - de-identification - ppsn datasets: - nvidia/Nemotron-PII base_model: - OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1 --- # OpenMed-PPSN-v4 ## Model Summary This model extends `OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1` with Irish PPSN detection (`B-PPSN` / `I-PPSN`) while preserving baseline behavior on non-PPSN entities. ## Intended Use - Clinical and administrative de-identification pipelines. - Hybrid usage is recommended: model detection + deterministic PPSN checksum validation. ## Out-of-Scope - Not a legal/compliance determination system. - Not validated for every institution, note format, OCR stream, or jurisdiction. ## Training Data - Synthetic PPSN span-labeled data generated with the official checksum rules. - Additional benchmarking reference: `nvidia/Nemotron-PII` (CC-BY-4.0). ## Evaluation Snapshot - Synthetic non-PPSN F1 (base): `0.8701` - Synthetic non-PPSN F1 (candidate): `0.8420` - Candidate-vs-base agreement F1 on real-text proxy set: `0.9814` - Entities/1k chars (base): `11.4124` - Entities/1k chars (candidate): `11.1231` ## Usage Load with `transformers` token classification pipeline, or integrate with a PPSN checksum validator in your masking service. ## Limitations - Real-text drift check used a proxy general-text set. - Production deployment requires local validation on your own data. ## Licensing and Attribution - Base model is tagged Apache-2.0. - This release is Apache-2.0. - Dataset attribution for CC-BY-4.0 sources is preserved in `NOTICE`.