Mark legacy repo as superseded by OpenMed-mLiteClinical-IrishPPSN-135M-v1
Browse files- NOTICE +12 -3
- README.md +20 -67
- pyproject.toml +3 -3
NOTICE
CHANGED
|
@@ -1,17 +1,26 @@
|
|
| 1 |
-
OpenMed PPSN Extension
|
| 2 |
Copyright 2026 Contributors
|
| 3 |
|
| 4 |
This project includes fine-tuned/derived model artifacts from:
|
| 5 |
-
- OpenMed/OpenMed-PII-
|
| 6 |
Declared license: Apache-2.0.
|
| 7 |
|
| 8 |
This project uses evaluation/training data sources including:
|
| 9 |
- nvidia/Nemotron-PII (Hugging Face dataset)
|
| 10 |
Declared license: CC-BY-4.0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
Attribution and links:
|
| 13 |
-
- OpenMed base model: https://huggingface.co/OpenMed/OpenMed-PII-
|
| 14 |
- Nemotron-PII dataset: https://huggingface.co/datasets/nvidia/Nemotron-PII
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
If you redistribute models or checkpoints produced here, keep this NOTICE,
|
| 17 |
retain upstream license notices, and provide dataset attribution where required.
|
|
|
|
| 1 |
+
OpenMed mLiteClinical Irish PPSN Extension
|
| 2 |
Copyright 2026 Contributors
|
| 3 |
|
| 4 |
This project includes fine-tuned/derived model artifacts from:
|
| 5 |
+
- OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1 (Hugging Face)
|
| 6 |
Declared license: Apache-2.0.
|
| 7 |
|
| 8 |
This project uses evaluation/training data sources including:
|
| 9 |
- nvidia/Nemotron-PII (Hugging Face dataset)
|
| 10 |
Declared license: CC-BY-4.0.
|
| 11 |
+
- joelniklaus/mapa (Hugging Face dataset)
|
| 12 |
+
Declared license: refer to dataset card.
|
| 13 |
+
- unimelb-nlp/wikiann (Hugging Face dataset)
|
| 14 |
+
Declared license: refer to dataset card.
|
| 15 |
+
- DataikuNLP/kiji-pii-training-data (Hugging Face dataset)
|
| 16 |
+
Declared license: refer to dataset card.
|
| 17 |
|
| 18 |
Attribution and links:
|
| 19 |
+
- OpenMed base model: https://huggingface.co/OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1
|
| 20 |
- Nemotron-PII dataset: https://huggingface.co/datasets/nvidia/Nemotron-PII
|
| 21 |
+
- MAPA dataset: https://huggingface.co/datasets/joelniklaus/mapa
|
| 22 |
+
- WikiANN dataset: https://huggingface.co/datasets/unimelb-nlp/wikiann
|
| 23 |
+
- KIJI PII training data: https://huggingface.co/datasets/DataikuNLP/kiji-pii-training-data
|
| 24 |
|
| 25 |
If you redistribute models or checkpoints produced here, keep this NOTICE,
|
| 26 |
retain upstream license notices, and provide dataset attribution where required.
|
README.md
CHANGED
|
@@ -12,89 +12,42 @@ tags:
|
|
| 12 |
- de-identification
|
| 13 |
- ireland
|
| 14 |
- ppsn
|
| 15 |
-
-
|
| 16 |
base_model:
|
| 17 |
- OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1
|
| 18 |
-
datasets:
|
| 19 |
-
- nvidia/Nemotron-PII
|
| 20 |
-
- joelniklaus/mapa
|
| 21 |
-
- unimelb-nlp/wikiann
|
| 22 |
-
- DataikuNLP/kiji-pii-training-data
|
| 23 |
-
model-index:
|
| 24 |
-
- name: OpenMed-PPSN-mLiteClinical-v1
|
| 25 |
-
results:
|
| 26 |
-
- task:
|
| 27 |
-
type: token-classification
|
| 28 |
-
name: PPSN detection (Irish large eval)
|
| 29 |
-
dataset:
|
| 30 |
-
name: irish_ppsn_eval_large_v2
|
| 31 |
-
type: custom
|
| 32 |
-
metrics:
|
| 33 |
-
- type: f1
|
| 34 |
-
value: 0.8979
|
| 35 |
-
name: Irish large F1
|
| 36 |
-
- task:
|
| 37 |
-
type: token-classification
|
| 38 |
-
name: PPSN detection (multilingual gov + citizen + HSE)
|
| 39 |
-
dataset:
|
| 40 |
-
name: multilingual_ppsn_v1_all
|
| 41 |
-
type: custom
|
| 42 |
-
metrics:
|
| 43 |
-
- type: f1
|
| 44 |
-
value: 0.9704
|
| 45 |
-
name: Multilingual suite F1
|
| 46 |
---
|
| 47 |
|
| 48 |
# OpenMed-PPSN-mLiteClinical-v1
|
| 49 |
|
| 50 |
-
|
| 51 |
|
| 52 |
-
|
| 53 |
|
| 54 |
-
-
|
| 55 |
-
-
|
| 56 |
-
- Tuned for Irish PPSN cases while retaining the base OpenMed multilingual PII labels
|
| 57 |
|
| 58 |
-
##
|
| 59 |
|
| 60 |
-
|
|
|
|
|
|
|
| 61 |
|
| 62 |
-
|
| 63 |
-
python3 inference_word_aligned.py \
|
| 64 |
-
--ppsn-min-score 0.4 \
|
| 65 |
-
--text "My PPSN is 1234567TW and I need help with my housing grant." \
|
| 66 |
-
--json
|
| 67 |
-
```
|
| 68 |
-
|
| 69 |
-
## Included Artifacts
|
| 70 |
|
| 71 |
-
|
| 72 |
-
- `model.safetensors`
|
| 73 |
-
- `config.json`
|
| 74 |
-
- `tokenizer.json`
|
| 75 |
-
- `tokenizer_config.json`
|
| 76 |
-
- `special_tokens_map.json`
|
| 77 |
-
- `label_meta.json`
|
| 78 |
-
- QA/inference files:
|
| 79 |
-
- `inference_word_aligned.py`
|
| 80 |
-
- `qa_config.json`
|
| 81 |
-
- `pyproject.toml`
|
| 82 |
-
- Eval artifacts in `eval/`
|
| 83 |
|
| 84 |
-
|
|
|
|
|
|
|
| 85 |
|
| 86 |
-
|
| 87 |
-
- QA regression v6 validated F1: `0.6667`
|
| 88 |
-
- QA regression v8 F1: `0.7385`
|
| 89 |
-
- Irish regression F1: `0.8000`
|
| 90 |
-
- Irish large F1: `0.8979`
|
| 91 |
-
- Multilingual suite F1: `0.9704`
|
| 92 |
-
- Non-PPSN agreement vs base mLiteClinical: `1.0000`
|
| 93 |
|
| 94 |
-
|
| 95 |
|
| 96 |
-
-
|
| 97 |
-
-
|
|
|
|
|
|
|
| 98 |
|
| 99 |
## License and Attribution
|
| 100 |
|
|
|
|
| 12 |
- de-identification
|
| 13 |
- ireland
|
| 14 |
- ppsn
|
| 15 |
+
- legacy
|
| 16 |
base_model:
|
| 17 |
- OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
---
|
| 19 |
|
| 20 |
# OpenMed-PPSN-mLiteClinical-v1
|
| 21 |
|
| 22 |
+
This repo remains available as a **legacy alias** for compatibility.
|
| 23 |
|
| 24 |
+
The canonical release for this model is now:
|
| 25 |
|
| 26 |
+
- `temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1`
|
| 27 |
+
- https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1
|
|
|
|
| 28 |
|
| 29 |
+
## Status
|
| 30 |
|
| 31 |
+
- Same model family and same intended use: Irish PPSN detection and masking
|
| 32 |
+
- New canonical repo has the clearer name, cleaned metadata, corrected attribution, and cleaner eval packaging
|
| 33 |
+
- Prefer the canonical repo for all new integrations, QA, and benchmarking
|
| 34 |
|
| 35 |
+
## Recommended Upgrade
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
+
Use the canonical release instead of this legacy alias:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
+
```bash
|
| 40 |
+
python3 inference_word_aligned.py --model temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1 --ppsn-min-score 0.4 --text "My PPSN is 1234567TW and I need help with my housing grant." --json
|
| 41 |
+
```
|
| 42 |
|
| 43 |
+
## Why The New Repo Exists
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
+
The original `OpenMed-PPSN-mLiteClinical-v1` name was serviceable but vague. The canonical repo name makes the scope explicit:
|
| 46 |
|
| 47 |
+
- `mLiteClinical`
|
| 48 |
+
- `IrishPPSN`
|
| 49 |
+
- `135M`
|
| 50 |
+
- `v1`
|
| 51 |
|
| 52 |
## License and Attribution
|
| 53 |
|
pyproject.toml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
[project]
|
| 2 |
-
name = "openmed-mliteclinical-
|
| 3 |
-
version = "
|
| 4 |
-
description = "mLiteClinical PPSN
|
| 5 |
requires-python = ">=3.10"
|
| 6 |
readme = "README.md"
|
| 7 |
license = { text = "Apache-2.0" }
|
|
|
|
| 1 |
[project]
|
| 2 |
+
name = "openmed-ppsn-mliteclinical-v1-legacy"
|
| 3 |
+
version = "1.0.1"
|
| 4 |
+
description = "Legacy alias for the OpenMed mLiteClinical Irish PPSN release; prefer the canonical repo"
|
| 5 |
requires-python = ">=3.10"
|
| 6 |
readme = "README.md"
|
| 7 |
license = { text = "Apache-2.0" }
|