temsa
/

OpenMed-PPSN-mLiteClinical-v1

@@ -1,17 +1,26 @@
-OpenMed PPSN Extension
 Copyright 2026 Contributors
 This project includes fine-tuned/derived model artifacts from:
-- OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1 (Hugging Face)
   Declared license: Apache-2.0.
 This project uses evaluation/training data sources including:
 - nvidia/Nemotron-PII (Hugging Face dataset)
   Declared license: CC-BY-4.0.
 Attribution and links:
-- OpenMed base model: https://huggingface.co/OpenMed/OpenMed-PII-SuperClinical-Large-434M-v1
 - Nemotron-PII dataset: https://huggingface.co/datasets/nvidia/Nemotron-PII
 If you redistribute models or checkpoints produced here, keep this NOTICE,
 retain upstream license notices, and provide dataset attribution where required.

+OpenMed mLiteClinical Irish PPSN Extension
 Copyright 2026 Contributors
 This project includes fine-tuned/derived model artifacts from:
+- OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1 (Hugging Face)
   Declared license: Apache-2.0.
 This project uses evaluation/training data sources including:
 - nvidia/Nemotron-PII (Hugging Face dataset)
   Declared license: CC-BY-4.0.
+- joelniklaus/mapa (Hugging Face dataset)
+  Declared license: refer to dataset card.
+- unimelb-nlp/wikiann (Hugging Face dataset)
+  Declared license: refer to dataset card.
+- DataikuNLP/kiji-pii-training-data (Hugging Face dataset)
+  Declared license: refer to dataset card.
 Attribution and links:
+- OpenMed base model: https://huggingface.co/OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1
 - Nemotron-PII dataset: https://huggingface.co/datasets/nvidia/Nemotron-PII
+- MAPA dataset: https://huggingface.co/datasets/joelniklaus/mapa
+- WikiANN dataset: https://huggingface.co/datasets/unimelb-nlp/wikiann
+- KIJI PII training data: https://huggingface.co/datasets/DataikuNLP/kiji-pii-training-data
 If you redistribute models or checkpoints produced here, keep this NOTICE,
 retain upstream license notices, and provide dataset attribution where required.

README.md CHANGED Viewed

@@ -12,89 +12,42 @@ tags:
 - de-identification
 - ireland
 - ppsn
-- multilingual
 base_model:
 - OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1
-datasets:
-- nvidia/Nemotron-PII
-- joelniklaus/mapa
-- unimelb-nlp/wikiann
-- DataikuNLP/kiji-pii-training-data
-model-index:
-- name: OpenMed-PPSN-mLiteClinical-v1
-  results:
-  - task:
-      type: token-classification
-      name: PPSN detection (Irish large eval)
-    dataset:
-      name: irish_ppsn_eval_large_v2
-      type: custom
-    metrics:
-    - type: f1
-      value: 0.8979
-      name: Irish large F1
-  - task:
-      type: token-classification
-      name: PPSN detection (multilingual gov + citizen + HSE)
-    dataset:
-      name: multilingual_ppsn_v1_all
-      type: custom
-    metrics:
-    - type: f1
-      value: 0.9704
-      name: Multilingual suite F1
 ---
 # OpenMed-PPSN-mLiteClinical-v1
-Full token-classification checkpoint derived from `OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1` with `B-PPSN` / `I-PPSN` support for Irish PPSN detection.
-## What This Release Is
-- A full `transformers` checkpoint
-- Intended for PPSN masking with the custom `word_aligned` decoder
-- Tuned for Irish PPSN cases while retaining the base OpenMed multilingual PII labels
-## Recommended Inference Path
-Use `inference_word_aligned.py`:
-```bash
-python3 inference_word_aligned.py \
-  --ppsn-min-score 0.4 \
-  --text "My PPSN is 1234567TW and I need help with my housing grant." \
-  --json
-```
-## Included Artifacts
-- Model files:
-  - `model.safetensors`
-  - `config.json`
-  - `tokenizer.json`
-  - `tokenizer_config.json`
-  - `special_tokens_map.json`
-  - `label_meta.json`
-- QA/inference files:
-  - `inference_word_aligned.py`
-  - `qa_config.json`
-  - `pyproject.toml`
-- Eval artifacts in `eval/`
-## Key Results
-- User raw regression F1: `0.8000`
-- QA regression v6 validated F1: `0.6667`
-- QA regression v8 F1: `0.7385`
-- Irish regression F1: `0.8000`
-- Irish large F1: `0.8979`
-- Multilingual suite F1: `0.9704`
-- Non-PPSN agreement vs base mLiteClinical: `1.0000`
-## Notes
-- The base `OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1` model has no PPSN label, so PPSN recall starts at zero until PPSN rows are added.
-- The recommended path for PPSN extraction is `word_aligned`, not the default token-aggregation path.
 ## License and Attribution

 - de-identification
 - ireland
 - ppsn
+- legacy
 base_model:
 - OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1
 ---
 # OpenMed-PPSN-mLiteClinical-v1
+This repo remains available as a **legacy alias** for compatibility.
+The canonical release for this model is now:
+- `temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1`
+- https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1
+## Status
+- Same model family and same intended use: Irish PPSN detection and masking
+- New canonical repo has the clearer name, cleaned metadata, corrected attribution, and cleaner eval packaging
+- Prefer the canonical repo for all new integrations, QA, and benchmarking
+## Recommended Upgrade
+Use the canonical release instead of this legacy alias:
+```bash
+python3 inference_word_aligned.py   --model temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1   --ppsn-min-score 0.4   --text "My PPSN is 1234567TW and I need help with my housing grant."   --json
+```
+## Why The New Repo Exists
+The original `OpenMed-PPSN-mLiteClinical-v1` name was serviceable but vague. The canonical repo name makes the scope explicit:
+- `mLiteClinical`
+- `IrishPPSN`
+- `135M`
+- `v1`
 ## License and Attribution

pyproject.toml CHANGED Viewed

@@ -1,7 +1,7 @@
 [project]
-name = "openmed-mliteclinical-ppsn"
-version = "0.1.0"
-description = "mLiteClinical PPSN token-classification release"
 requires-python = ">=3.10"
 readme = "README.md"
 license = { text = "Apache-2.0" }

 [project]
+name = "openmed-ppsn-mliteclinical-v1-legacy"
+version = "1.0.1"
+description = "Legacy alias for the OpenMed mLiteClinical Irish PPSN release; prefer the canonical repo"
 requires-python = ">=3.10"
 readme = "README.md"
 license = { text = "Apache-2.0" }