| | --- |
| | language: |
| | - en |
| | license: apache-2.0 |
| | library_name: transformers |
| | tags: |
| | - pii |
| | - phi |
| | - token-classification |
| | - healthcare |
| | - de-identification |
| | - ppsn |
| | datasets: |
| | - nvidia/Nemotron-PII |
| | base_model: |
| | - OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1 |
| | --- |
| | |
| | # OpenMed-PPSN-v4 |
| |
|
| | ## Model Summary |
| |
|
| | This model extends `OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1` with Irish PPSN detection (`B-PPSN` / `I-PPSN`) while preserving baseline behavior on non-PPSN entities. |
| |
|
| | ## Intended Use |
| |
|
| | - Clinical and administrative de-identification pipelines. |
| | - Hybrid usage is recommended: model detection + deterministic PPSN checksum validation. |
| |
|
| | ## Out-of-Scope |
| |
|
| | - Not a legal/compliance determination system. |
| | - Not validated for every institution, note format, OCR stream, or jurisdiction. |
| |
|
| | ## Training Data |
| |
|
| | - Synthetic PPSN span-labeled data generated with the official checksum rules. |
| | - Additional benchmarking reference: `nvidia/Nemotron-PII` (CC-BY-4.0). |
| |
|
| | ## Evaluation Snapshot |
| |
|
| | - Synthetic non-PPSN F1 (base): `0.8701` |
| | - Synthetic non-PPSN F1 (candidate): `0.8420` |
| | - Candidate-vs-base agreement F1 on real-text proxy set: `0.9814` |
| | - Entities/1k chars (base): `11.4124` |
| | - Entities/1k chars (candidate): `11.1231` |
| |
|
| | ## Usage |
| |
|
| | Load with `transformers` token classification pipeline, or integrate with a PPSN checksum validator in your masking service. |
| |
|
| | ## Limitations |
| |
|
| | - Real-text drift check used a proxy general-text set. |
| | - Production deployment requires local validation on your own data. |
| |
|
| | ## Licensing and Attribution |
| |
|
| | - Base model is tagged Apache-2.0. |
| | - This release is Apache-2.0. |
| | - Dataset attribution for CC-BY-4.0 sources is preserved in `NOTICE`. |
| |
|