OpenMed-PPSN-v4

Model Summary

This model extends OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1 with Irish PPSN detection (B-PPSN / I-PPSN) while preserving baseline behavior on non-PPSN entities.

Intended Use

  • Clinical and administrative de-identification pipelines.
  • Hybrid usage is recommended: model detection + deterministic PPSN checksum validation.

Out-of-Scope

  • Not a legal/compliance determination system.
  • Not validated for every institution, note format, OCR stream, or jurisdiction.

Training Data

  • Synthetic PPSN span-labeled data generated with the official checksum rules.
  • Additional benchmarking reference: nvidia/Nemotron-PII (CC-BY-4.0).

Evaluation Snapshot

  • Synthetic non-PPSN F1 (base): 0.8701
  • Synthetic non-PPSN F1 (candidate): 0.8420
  • Candidate-vs-base agreement F1 on real-text proxy set: 0.9814
  • Entities/1k chars (base): 11.4124
  • Entities/1k chars (candidate): 11.1231

Usage

Load with transformers token classification pipeline, or integrate with a PPSN checksum validator in your masking service.

Limitations

  • Real-text drift check used a proxy general-text set.
  • Production deployment requires local validation on your own data.

Licensing and Attribution

  • Base model is tagged Apache-2.0.
  • This release is Apache-2.0.
  • Dataset attribution for CC-BY-4.0 sources is preserved in NOTICE.
Downloads last month
27
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for temsa/OpenMed-PPSN-v4

Dataset used to train temsa/OpenMed-PPSN-v4