Clean public README and metadata
Browse files- README.md +33 -16
- eval/benchmark_summary.md +30 -59
- eval/multilabel_summary.json +1 -6
- eval/ppsn_only_summary.json +1 -9
- label_meta.json +2 -2
- training_sources.json +24 -38
README.md
CHANGED
|
@@ -26,11 +26,18 @@ base_model:
|
|
| 26 |
|
| 27 |
QA release candidate for Irish core PII detection with OpenMed mLiteClinical.
|
| 28 |
|
| 29 |
-
This
|
| 30 |
|
| 31 |
-
-
|
| 32 |
-
-
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
## Coverage
|
| 36 |
|
|
@@ -59,26 +66,36 @@ python3 inference_mask.py \
|
|
| 59 |
--json
|
| 60 |
```
|
| 61 |
|
| 62 |
-
##
|
|
|
|
|
|
|
| 63 |
|
| 64 |
| Model | User Raw | Core PPSN | Edge PPSN | QA v8 PPSN | Irish Large PPSN |
|
| 65 |
|---|---:|---:|---:|---:|---:|
|
| 66 |
-
|
|
| 67 |
-
|
|
| 68 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
-
##
|
| 71 |
|
| 72 |
-
|
| 73 |
|
| 74 |
-
|
| 75 |
|
| 76 |
-
|
| 77 |
-
- Irish edge overall F1: `0.8205`
|
| 78 |
-
- phone_number core F1: `0.9167`
|
| 79 |
-
- postcode core F1: `0.7500`
|
| 80 |
|
| 81 |
-
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
## Included Files
|
| 84 |
|
|
|
|
| 26 |
|
| 27 |
QA release candidate for Irish core PII detection with OpenMed mLiteClinical.
|
| 28 |
|
| 29 |
+
This repository should be evaluated against the current public release:
|
| 30 |
|
| 31 |
+
- current public release: `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1`
|
| 32 |
+
- this repository: `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1`
|
| 33 |
+
|
| 34 |
+
The purpose of this RC is specific: improve weak-context PPSN detection without leaving the raw-model-only approach.
|
| 35 |
+
|
| 36 |
+
In particular, this RC is intended to fix cases like:
|
| 37 |
+
|
| 38 |
+
- `1234567T - am I eligible for the housing grant?`
|
| 39 |
+
- `I was told to provide my number 1234567T when applying, what do I do next?`
|
| 40 |
+
- `My ppsn is 1234567tw and I need to know about carer's allowance`
|
| 41 |
|
| 42 |
## Coverage
|
| 43 |
|
|
|
|
| 66 |
--json
|
| 67 |
```
|
| 68 |
|
| 69 |
+
## Comparison To The Current Public Release
|
| 70 |
+
|
| 71 |
+
PPSN-only comparison:
|
| 72 |
|
| 73 |
| Model | User Raw | Core PPSN | Edge PPSN | QA v8 PPSN | Irish Large PPSN |
|
| 74 |
|---|---:|---:|---:|---:|---:|
|
| 75 |
+
| `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1` | 0.8000 | 0.0800 | 0.4211 | 0.7385 | 0.8980 |
|
| 76 |
+
| `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1` | 1.0000 | 0.8571 | 0.8571 | 0.7353 | 0.9403 |
|
| 77 |
+
|
| 78 |
+
Broader Irish-core multilabel view at the recommended thresholds for this RC (`--ppsn-min-score 0.5 --other-min-score 0.4`):
|
| 79 |
+
|
| 80 |
+
- overall Irish core F1: `0.9487`
|
| 81 |
+
- overall Irish edge F1: `0.8205`
|
| 82 |
+
- `phone_number` core F1: `0.9167`
|
| 83 |
+
- `postcode` core F1: `0.7500`
|
| 84 |
+
- `PPSN` core F1: `0.8571`
|
| 85 |
+
- `PPSN` edge F1: `0.8571`
|
| 86 |
|
| 87 |
+
## How To Read This RC
|
| 88 |
|
| 89 |
+
Compared with the current public `v1` release, this RC is much stronger on the weak-context PPSN cases that were previously missed.
|
| 90 |
|
| 91 |
+
That is the main reason to test it.
|
| 92 |
|
| 93 |
+
This RC should still be validated carefully on:
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
+
- Irish phone numbers with spaces
|
| 96 |
+
- Irish Eircodes
|
| 97 |
+
- bank/account details
|
| 98 |
+
- names and emails in English and Irish Gaelic
|
| 99 |
|
| 100 |
## Included Files
|
| 101 |
|
eval/benchmark_summary.md
CHANGED
|
@@ -1,79 +1,50 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
-
- base: `models/openmed-mliteclinical-irish-core-v14_userboost_cls_s50`
|
| 7 |
-
- training mix: `data/ppsn_recover_v4_mix`
|
| 8 |
-
- setup: LoRA recovery with `v14` as teacher, PPSN classifier rows left mutable, encoder updated through LoRA
|
| 9 |
-
- recommended operating point: `--min-score 0.4 --ppsn-min-score 0.5 --ppsn-decoder word_aligned`
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
|
| 16 |
-
- `p2`: `I was told to provide my number 1234567T when applying, what do I do next?` -> detected
|
| 17 |
-
- `p3`: `My ppsn is 1234567tw and I need to know about carer's allowance` -> detected
|
| 18 |
-
- `n1`: `123456T ...` -> no PPSN prediction
|
| 19 |
-
- `n2`: `12345678T ...` -> no PPSN prediction
|
| 20 |
-
- `n3`: `0871234567 ...` -> no PPSN prediction
|
| 21 |
-
- `n4`: `2024T ...` -> no PPSN prediction
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
## PPSN-Only Comparison
|
| 26 |
|
| 27 |
-
| Model |
|
| 28 |
-
|---|---:|---:|---:|---:|---:|
|
| 29 |
-
| `
|
| 30 |
-
| `
|
| 31 |
-
| `models/openmed-mliteclinical-irish-core-v15_weakctx_lora_s160` | `0.50` | `1.0000` | `0.8571` | `0.8571` | `0.7353` | `0.9403` |
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
-
|
| 36 |
-
- `reports/current_edge_ppsnonly.json`
|
| 37 |
-
- `reports/v14_core_ppsnonly.json`
|
| 38 |
-
- `reports/v14_edge_ppsnonly.json`
|
| 39 |
-
- `reports/benchmark_user_v15_ppsnonly_t050.json`
|
| 40 |
-
- `reports/benchmark_core_ppsn_v15_ppsnonly_t050.json`
|
| 41 |
-
- `reports/benchmark_edge_ppsn_v15_ppsnonly_t050.json`
|
| 42 |
-
- `reports/benchmark_v8_v15_ppsnonly_t050.json`
|
| 43 |
-
- `reports/benchmark_large_v15_ppsnonly_t050.json`
|
| 44 |
|
| 45 |
-
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
|
| 48 |
|
| 49 |
-
-
|
| 50 |
-
-
|
| 51 |
-
-
|
| 52 |
-
-
|
| 53 |
-
- phone number on `eval/irish_core_pii_v1.jsonl`: F1 `0.9167`
|
| 54 |
-
- postcode on `eval/irish_core_pii_v1.jsonl`: F1 `0.7500`
|
| 55 |
-
|
| 56 |
-
Compared with `v14`, this is the tradeoff:
|
| 57 |
|
| 58 |
-
|
| 59 |
-
- better: edge PPSN F1 (`0.8571` vs `0.5000`)
|
| 60 |
-
- slightly worse: broad Irish-core multilabel F1 (`0.9487` vs `0.9677`)
|
| 61 |
-
- slightly worse: phone/postcode retention in the small Irish core suite
|
| 62 |
|
| 63 |
-
|
| 64 |
|
| 65 |
-
- `
|
| 66 |
-
- `
|
| 67 |
-
- `
|
| 68 |
-
- `
|
| 69 |
-
- `reports/benchmark_large_v15_m040_p050.json`
|
| 70 |
-
- `reports/tmp_core_v14_035.json`
|
| 71 |
-
- `reports/tmp_edge_v14_035.json`
|
| 72 |
-
|
| 73 |
-
## Decision
|
| 74 |
|
| 75 |
-
|
| 76 |
|
| 77 |
-
|
| 78 |
|
| 79 |
-
|
|
|
|
| 1 |
+
# Benchmark Summary
|
| 2 |
|
| 3 |
+
This file summarizes the public comparison relevant for QA.
|
| 4 |
|
| 5 |
+
## Baseline
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
+
Current public release:
|
| 8 |
|
| 9 |
+
- `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1`
|
| 10 |
|
| 11 |
+
Candidate under test:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
- `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1`
|
| 14 |
|
| 15 |
## PPSN-Only Comparison
|
| 16 |
|
| 17 |
+
| Model | User Raw | Core PPSN | Edge PPSN | QA v8 PPSN | Irish Large PPSN |
|
| 18 |
+
|---|---:|---:|---:|---:|---:|
|
| 19 |
+
| `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1` | 0.8000 | 0.0800 | 0.4211 | 0.7385 | 0.8980 |
|
| 20 |
+
| `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1` | 1.0000 | 0.8571 | 0.8571 | 0.7353 | 0.9403 |
|
|
|
|
| 21 |
|
| 22 |
+
## Exact Weak-Context PPSN Cases
|
| 23 |
|
| 24 |
+
At `--ppsn-min-score 0.5`, this RC detects:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
- `1234567T - am I eligible for the housing grant?`
|
| 27 |
+
- `I was told to provide my number 1234567T when applying, what do I do next?`
|
| 28 |
+
- `My ppsn is 1234567tw and I need to know about carer's allowance`
|
| 29 |
|
| 30 |
+
And does not label these as PPSN:
|
| 31 |
|
| 32 |
+
- `123456T`
|
| 33 |
+
- `12345678T`
|
| 34 |
+
- `0871234567`
|
| 35 |
+
- `2024T`
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
+
## Multilabel Snapshot
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
+
At `--ppsn-min-score 0.5 --other-min-score 0.4`:
|
| 40 |
|
| 41 |
+
- Irish core overall F1: `0.9487`
|
| 42 |
+
- Irish edge overall F1: `0.8205`
|
| 43 |
+
- `phone_number` core F1: `0.9167`
|
| 44 |
+
- `postcode` core F1: `0.7500`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
## QA Reading
|
| 47 |
|
| 48 |
+
This RC exists to improve weak-context PPSN reliability relative to the current public `v1` release.
|
| 49 |
|
| 50 |
+
QA should compare it directly against `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1` on production-like Irish traffic.
|
eval/multilabel_summary.json
CHANGED
|
@@ -7,13 +7,8 @@
|
|
| 7 |
"overall_core_f1": 0.515,
|
| 8 |
"overall_edge_f1": 0.2326
|
| 9 |
},
|
| 10 |
-
"previous_internal_best": {
|
| 11 |
-
"name": "v14",
|
| 12 |
-
"overall_core_f1": 0.9677419355,
|
| 13 |
-
"overall_edge_f1": 0.8823529412
|
| 14 |
-
},
|
| 15 |
"this_rc": {
|
| 16 |
-
"name": "
|
| 17 |
"overall_core_f1": 0.9487179487,
|
| 18 |
"overall_edge_f1": 0.8205128205,
|
| 19 |
"phone_core_f1": 0.9166666667,
|
|
|
|
| 7 |
"overall_core_f1": 0.515,
|
| 8 |
"overall_edge_f1": 0.2326
|
| 9 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
"this_rc": {
|
| 11 |
+
"name": "current release candidate",
|
| 12 |
"overall_core_f1": 0.9487179487,
|
| 13 |
"overall_edge_f1": 0.8205128205,
|
| 14 |
"phone_core_f1": 0.9166666667,
|
eval/ppsn_only_summary.json
CHANGED
|
@@ -8,16 +8,8 @@
|
|
| 8 |
"v8_ppsn_f1": 0.7384615385,
|
| 9 |
"irish_large_ppsn_f1": 0.898
|
| 10 |
},
|
| 11 |
-
"previous_internal_best": {
|
| 12 |
-
"name": "v14",
|
| 13 |
-
"user_raw_f1": 0.5,
|
| 14 |
-
"core_ppsn_f1": 0.9090909091,
|
| 15 |
-
"edge_ppsn_f1": 0.5,
|
| 16 |
-
"v8_ppsn_f1": 0.71875,
|
| 17 |
-
"irish_large_ppsn_f1": 0.9383658468
|
| 18 |
-
},
|
| 19 |
"this_rc": {
|
| 20 |
-
"name": "
|
| 21 |
"user_raw_f1": 1.0,
|
| 22 |
"core_ppsn_f1": 0.8571428571,
|
| 23 |
"edge_ppsn_f1": 0.8571428571,
|
|
|
|
| 8 |
"v8_ppsn_f1": 0.7384615385,
|
| 9 |
"irish_large_ppsn_f1": 0.898
|
| 10 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
"this_rc": {
|
| 12 |
+
"name": "current release candidate",
|
| 13 |
"user_raw_f1": 1.0,
|
| 14 |
"core_ppsn_f1": 0.8571428571,
|
| 15 |
"edge_ppsn_f1": 0.8571428571,
|
label_meta.json
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
{
|
| 2 |
-
"base_model": "
|
| 3 |
"label_list": [
|
| 4 |
"O",
|
| 5 |
"B-account_number",
|
|
@@ -117,4 +117,4 @@
|
|
| 117 |
"extra_labels": [
|
| 118 |
"PPSN"
|
| 119 |
]
|
| 120 |
-
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"base_model": "OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1",
|
| 3 |
"label_list": [
|
| 4 |
"O",
|
| 5 |
"B-account_number",
|
|
|
|
| 117 |
"extra_labels": [
|
| 118 |
"PPSN"
|
| 119 |
]
|
| 120 |
+
}
|
training_sources.json
CHANGED
|
@@ -1,47 +1,33 @@
|
|
| 1 |
{
|
| 2 |
"base_model": "OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1",
|
| 3 |
-
"
|
| 4 |
-
"
|
| 5 |
-
"recovery_adapter": "v15 weak-context PPSN recovery adapter",
|
| 6 |
"recommended_thresholds": {
|
| 7 |
"ppsn_min_score": 0.5,
|
| 8 |
"other_min_score": 0.4
|
| 9 |
},
|
| 10 |
-
"
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
},
|
| 33 |
-
{
|
| 34 |
-
"name": "irish_core_release_v2_mix",
|
| 35 |
-
"weight": 5.0,
|
| 36 |
-
"kind": "synthetic_replay_mix"
|
| 37 |
-
},
|
| 38 |
-
{
|
| 39 |
-
"name": "irish_ppsn_eircode_spec_v1",
|
| 40 |
-
"weight": 1.0,
|
| 41 |
-
"kind": "synthetic_spec_dataset"
|
| 42 |
-
}
|
| 43 |
-
]
|
| 44 |
-
},
|
| 45 |
"upstream_attribution": [
|
| 46 |
{
|
| 47 |
"name": "joelniklaus/mapa",
|
|
|
|
| 1 |
{
|
| 2 |
"base_model": "OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1",
|
| 3 |
+
"current_public_reference": "temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1",
|
| 4 |
+
"release_purpose": "Targeted weak-context PPSN recovery for the IrishCorePII release line.",
|
|
|
|
| 5 |
"recommended_thresholds": {
|
| 6 |
"ppsn_min_score": 0.5,
|
| 7 |
"other_min_score": 0.4
|
| 8 |
},
|
| 9 |
+
"training_mix_summary": [
|
| 10 |
+
{
|
| 11 |
+
"component": "duplicated weak-context PPSN regression cases",
|
| 12 |
+
"weight": 7.0
|
| 13 |
+
},
|
| 14 |
+
{
|
| 15 |
+
"component": "Irish PPSN and phone edge-case replay",
|
| 16 |
+
"weight": 3.0
|
| 17 |
+
},
|
| 18 |
+
{
|
| 19 |
+
"component": "synthetic PPSN focus data with weak-context positives and hard negatives",
|
| 20 |
+
"weight": 4.0
|
| 21 |
+
},
|
| 22 |
+
{
|
| 23 |
+
"component": "broader Irish core PII replay mix",
|
| 24 |
+
"weight": 5.0
|
| 25 |
+
},
|
| 26 |
+
{
|
| 27 |
+
"component": "spec-driven Irish PPSN and Eircode synthetic data",
|
| 28 |
+
"weight": 1.0
|
| 29 |
+
}
|
| 30 |
+
],
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
"upstream_attribution": [
|
| 32 |
{
|
| 33 |
"name": "joelniklaus/mapa",
|