OpenMed PPSN v5.1

Token classification model derived from OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1 with B-PPSN / I-PPSN support for Irish PPSN detection.

Why v5.1

This iteration hardens PPSN behavior against known false positives on number-like strings (for example phone numbers and malformed ID-like tokens) while preserving non-PPSN behavior.

What this release contains

  • Full model weights (model.safetensors) with original OpenMed labels + PPSN labels.
  • label_meta.json with label mapping and provenance.
  • Eval artifacts:
    • eval_manual_irish_v5_1_large_v2_raw.json
    • eval_hybrid_v5_1_large_v2_strict.json
    • ab_non_ppsn_v5_1.json
    • qa_ppsn_regression_v6_validated.jsonl
    • eval_manual_qa_regression_v6_validated_v5_1_raw.json
    • eval_hybrid_qa_regression_v6_validated_v5_1_strict.json

Key results

On irish_ppsn_eval_large_v2:

  • Raw model-only PPSN performance:
    • P=0.8699 R=0.9956 F1=0.9285
  • Recommended strict hybrid mode (--no-plausible-ppsn, --ppsn-min-score 0.6):
    • P=1.0000 R=0.9997 F1=0.9999

Non-PPSN retention vs base OpenMed (ab_non_ppsn_v5_1.json):

  • Synthetic non-PPSN F1 delta vs base: +0.00049
  • Real-set agreement F1 (candidate vs base): 1.0000
  • Real entities per 1k chars delta vs base: 0.0000

Recommended production usage

Use strict hybrid PPSN post-processing (checksum-backed) for production masking. Raw model-only PPSN spans are less reliable on malformed numeric strings.

License and attribution

  • This derivative release is distributed under Apache-2.0, consistent with the base model license tag.
  • Base model: OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1.
  • See NOTICE for attribution details.
Downloads last month
16
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for temsa/OpenMed-PPSN-v5_1

Datasets used to train temsa/OpenMed-PPSN-v5_1

Evaluation results