OpenMed PPSN v5

Token classification model derived from OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1 with added B-PPSN / I-PPSN support for Irish PPS Number detection.

What this release contains

  • Full model weights (model.safetensors) with original OpenMed labels + PPSN labels.
  • label_meta.json with label mapping and provenance.
  • Repro/eval artifacts:
    • irish_ppsn_regression_v5.jsonl
    • eval_manual_irish_v5_raw.json
    • eval_hybrid_v5_no_plausible.json
    • ab_non_ppsn_v5_baseline.json

Key results

On irish_ppsn_regression_v5:

  • Raw model-only PPSN performance: P=0.7778 R=0.8750 F1=0.8235.
  • Recommended strict hybrid mode (model + checksum + span normalization, no plausible fallback):
    • P=1.0000 R=1.0000 F1=1.0000.

Non-PPSN retention check vs base OpenMed (A/B):

  • Real-text behavior agreement F1: 1.0000.
  • Entity density unchanged in the sampled real-text proxy set.

Recommended production usage

For best PPSN reliability, run this model with a strict hybrid post-processor:

  1. Keep model detections for all labels.
  2. For PPSN specifically, normalize/expand spans to full PPSN candidates.
  3. Validate PPSN with checksum.
  4. Disable broad "plausible" fallback in production (no-plausible-ppsn).

This repository includes only the model artifacts; hybrid scripts are in the companion codebase used to train/evaluate this release.

Limitations

  • The Irish regression set is small and targeted; additional domain-specific validation is required before regulated deployment.
  • PPSN detection in noisy OCR/ASR or heavily malformed text may require extra hardening.

License and attribution

  • This derivative release is distributed under Apache-2.0, consistent with the base model license tag.
  • Base model: OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1.
  • Training/evaluation used synthetic and augmented data sources, including nvidia/Nemotron-PII and joelniklaus/mapa.
  • See NOTICE for attribution details.

QA quick check

  • Load this model as a standard AutoModelForTokenClassification checkpoint.
  • Run against irish_ppsn_regression_v5.jsonl.
  • Confirm results in eval_manual_irish_v5_raw.json (raw) and eval_hybrid_v5_no_plausible.json (strict hybrid).
Downloads last month
15
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for temsa/OpenMed-PPSN-v5

Datasets used to train temsa/OpenMed-PPSN-v5

Evaluation results