| | --- |
| | language: |
| | - en |
| | - ga |
| | license: apache-2.0 |
| | library_name: transformers |
| | tags: |
| | - pii |
| | - token-classification |
| | - healthcare |
| | - de-identification |
| | - ireland |
| | - ppsn |
| | base_model: |
| | - OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1 |
| | datasets: |
| | - nvidia/Nemotron-PII |
| | - joelniklaus/mapa |
| | model-index: |
| | - name: OpenMed-PPSN-v5 |
| | results: |
| | - task: |
| | type: token-classification |
| | name: PPSN detection (Irish regression set) |
| | dataset: |
| | name: irish_ppsn_regression_v5 |
| | type: custom |
| | metrics: |
| | - type: f1 |
| | value: 0.8235 |
| | name: Raw model F1 |
| | - type: f1 |
| | value: 1.0000 |
| | name: Hybrid strict F1 |
| | --- |
| | |
| | # OpenMed PPSN v5 |
| |
|
| | Token classification model derived from `OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1` with added `B-PPSN` / `I-PPSN` support for Irish PPS Number detection. |
| |
|
| | ## What this release contains |
| |
|
| | - Full model weights (`model.safetensors`) with original OpenMed labels + PPSN labels. |
| | - `label_meta.json` with label mapping and provenance. |
| | - Repro/eval artifacts: |
| | - `irish_ppsn_regression_v5.jsonl` |
| | - `eval_manual_irish_v5_raw.json` |
| | - `eval_hybrid_v5_no_plausible.json` |
| | - `ab_non_ppsn_v5_baseline.json` |
| |
|
| | ## Key results |
| |
|
| | On `irish_ppsn_regression_v5`: |
| |
|
| | - Raw model-only PPSN performance: `P=0.7778 R=0.8750 F1=0.8235`. |
| | - Recommended strict hybrid mode (model + checksum + span normalization, no plausible fallback): |
| | - `P=1.0000 R=1.0000 F1=1.0000`. |
| |
|
| | Non-PPSN retention check vs base OpenMed (A/B): |
| |
|
| | - Real-text behavior agreement F1: `1.0000`. |
| | - Entity density unchanged in the sampled real-text proxy set. |
| |
|
| | ## Recommended production usage |
| |
|
| | For best PPSN reliability, run this model with a strict hybrid post-processor: |
| |
|
| | 1. Keep model detections for all labels. |
| | 2. For PPSN specifically, normalize/expand spans to full PPSN candidates. |
| | 3. Validate PPSN with checksum. |
| | 4. Disable broad "plausible" fallback in production (`no-plausible-ppsn`). |
| |
|
| | This repository includes only the model artifacts; hybrid scripts are in the companion codebase used to train/evaluate this release. |
| |
|
| | ## Limitations |
| |
|
| | - The Irish regression set is small and targeted; additional domain-specific validation is required before regulated deployment. |
| | - PPSN detection in noisy OCR/ASR or heavily malformed text may require extra hardening. |
| |
|
| | ## License and attribution |
| |
|
| | - This derivative release is distributed under Apache-2.0, consistent with the base model license tag. |
| | - Base model: `OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1`. |
| | - Training/evaluation used synthetic and augmented data sources, including `nvidia/Nemotron-PII` and `joelniklaus/mapa`. |
| | - See `NOTICE` for attribution details. |
| |
|
| | ## QA quick check |
| |
|
| | - Load this model as a standard `AutoModelForTokenClassification` checkpoint. |
| | - Run against `irish_ppsn_regression_v5.jsonl`. |
| | - Confirm results in `eval_manual_irish_v5_raw.json` (raw) and `eval_hybrid_v5_no_plausible.json` (strict hybrid). |
| |
|