OpenMed-PPSN-v4 / README.md
temsa's picture
Clean initial public release (sanitized)
c4ec319
metadata
language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - pii
  - phi
  - token-classification
  - healthcare
  - de-identification
  - ppsn
datasets:
  - nvidia/Nemotron-PII
base_model:
  - OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1

OpenMed-PPSN-v4

Model Summary

This model extends OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1 with Irish PPSN detection (B-PPSN / I-PPSN) while preserving baseline behavior on non-PPSN entities.

Intended Use

  • Clinical and administrative de-identification pipelines.
  • Hybrid usage is recommended: model detection + deterministic PPSN checksum validation.

Out-of-Scope

  • Not a legal/compliance determination system.
  • Not validated for every institution, note format, OCR stream, or jurisdiction.

Training Data

  • Synthetic PPSN span-labeled data generated with the official checksum rules.
  • Additional benchmarking reference: nvidia/Nemotron-PII (CC-BY-4.0).

Evaluation Snapshot

  • Synthetic non-PPSN F1 (base): 0.8701
  • Synthetic non-PPSN F1 (candidate): 0.8420
  • Candidate-vs-base agreement F1 on real-text proxy set: 0.9814
  • Entities/1k chars (base): 11.4124
  • Entities/1k chars (candidate): 11.1231

Usage

Load with transformers token classification pipeline, or integrate with a PPSN checksum validator in your masking service.

Limitations

  • Real-text drift check used a proxy general-text set.
  • Production deployment requires local validation on your own data.

Licensing and Attribution

  • Base model is tagged Apache-2.0.
  • This release is Apache-2.0.
  • Dataset attribution for CC-BY-4.0 sources is preserved in NOTICE.