temsa commited on
Commit
74389d1
·
verified ·
1 Parent(s): f9ecbf3

Clean public README and metadata

Browse files
README.md CHANGED
@@ -26,11 +26,18 @@ base_model:
26
 
27
  QA release candidate for Irish core PII detection with OpenMed mLiteClinical.
28
 
29
- This RC is a full merged checkpoint built from the `v15` weak-context PPSN recovery adapter. It is the first raw-model candidate in this line that closes the exact reported PPSN weak-context misses:
30
 
31
- - `1234567T` at sentence start
32
- - `... provide my number 1234567T ...`
33
- - lowercase `1234567tw` in weaker English support context
 
 
 
 
 
 
 
34
 
35
  ## Coverage
36
 
@@ -59,26 +66,36 @@ python3 inference_mask.py \
59
  --json
60
  ```
61
 
62
- ## PPSN-Only Comparison
 
 
63
 
64
  | Model | User Raw | Core PPSN | Edge PPSN | QA v8 PPSN | Irish Large PPSN |
65
  |---|---:|---:|---:|---:|---:|
66
- | Current public | 0.8000 | 0.0800 | 0.4211 | 0.7385 | 0.8980 |
67
- | Previous internal best (`v14`) | 0.5000 | 0.9091 | 0.5000 | 0.7188 | 0.9384 |
68
- | This RC (`v15`) | 1.0000 | 0.8571 | 0.8571 | 0.7353 | 0.9403 |
 
 
 
 
 
 
 
 
69
 
70
- ## Main Tradeoff
71
 
72
- Relative to `v14`, this RC materially improves weak-context PPSN recall, but gives up a small amount of broader Irish-core multilabel quality.
73
 
74
- At the recommended thresholds:
75
 
76
- - Irish core overall F1: `0.9487`
77
- - Irish edge overall F1: `0.8205`
78
- - phone_number core F1: `0.9167`
79
- - postcode core F1: `0.7500`
80
 
81
- So this RC is the right choice if the blocking issue is weak-context PPSN reliability.
 
 
 
82
 
83
  ## Included Files
84
 
 
26
 
27
  QA release candidate for Irish core PII detection with OpenMed mLiteClinical.
28
 
29
+ This repository should be evaluated against the current public release:
30
 
31
+ - current public release: `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1`
32
+ - this repository: `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1`
33
+
34
+ The purpose of this RC is specific: improve weak-context PPSN detection without leaving the raw-model-only approach.
35
+
36
+ In particular, this RC is intended to fix cases like:
37
+
38
+ - `1234567T - am I eligible for the housing grant?`
39
+ - `I was told to provide my number 1234567T when applying, what do I do next?`
40
+ - `My ppsn is 1234567tw and I need to know about carer's allowance`
41
 
42
  ## Coverage
43
 
 
66
  --json
67
  ```
68
 
69
+ ## Comparison To The Current Public Release
70
+
71
+ PPSN-only comparison:
72
 
73
  | Model | User Raw | Core PPSN | Edge PPSN | QA v8 PPSN | Irish Large PPSN |
74
  |---|---:|---:|---:|---:|---:|
75
+ | `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1` | 0.8000 | 0.0800 | 0.4211 | 0.7385 | 0.8980 |
76
+ | `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1` | 1.0000 | 0.8571 | 0.8571 | 0.7353 | 0.9403 |
77
+
78
+ Broader Irish-core multilabel view at the recommended thresholds for this RC (`--ppsn-min-score 0.5 --other-min-score 0.4`):
79
+
80
+ - overall Irish core F1: `0.9487`
81
+ - overall Irish edge F1: `0.8205`
82
+ - `phone_number` core F1: `0.9167`
83
+ - `postcode` core F1: `0.7500`
84
+ - `PPSN` core F1: `0.8571`
85
+ - `PPSN` edge F1: `0.8571`
86
 
87
+ ## How To Read This RC
88
 
89
+ Compared with the current public `v1` release, this RC is much stronger on the weak-context PPSN cases that were previously missed.
90
 
91
+ That is the main reason to test it.
92
 
93
+ This RC should still be validated carefully on:
 
 
 
94
 
95
+ - Irish phone numbers with spaces
96
+ - Irish Eircodes
97
+ - bank/account details
98
+ - names and emails in English and Irish Gaelic
99
 
100
  ## Included Files
101
 
eval/benchmark_summary.md CHANGED
@@ -1,79 +1,50 @@
1
- # Irish Core v15 Weak-Context PPSN Recovery
2
 
3
- ## Candidate
4
 
5
- - model: `models/openmed-mliteclinical-irish-core-v15_weakctx_lora_s160`
6
- - base: `models/openmed-mliteclinical-irish-core-v14_userboost_cls_s50`
7
- - training mix: `data/ppsn_recover_v4_mix`
8
- - setup: LoRA recovery with `v14` as teacher, PPSN classifier rows left mutable, encoder updated through LoRA
9
- - recommended operating point: `--min-score 0.4 --ppsn-min-score 0.5 --ppsn-decoder word_aligned`
10
 
11
- ## Exact Weak-Context PPSN Result
12
 
13
- At `--ppsn-min-score 0.5`, the exact previously failing user PPSN cases are fixed:
14
 
15
- - `p1`: `1234567T - am I eligible for the housing grant?` -> detected
16
- - `p2`: `I was told to provide my number 1234567T when applying, what do I do next?` -> detected
17
- - `p3`: `My ppsn is 1234567tw and I need to know about carer's allowance` -> detected
18
- - `n1`: `123456T ...` -> no PPSN prediction
19
- - `n2`: `12345678T ...` -> no PPSN prediction
20
- - `n3`: `0871234567 ...` -> no PPSN prediction
21
- - `n4`: `2024T ...` -> no PPSN prediction
22
 
23
- Reference: `reports/benchmark_user_v15_ppsnonly_t050.json`
24
 
25
  ## PPSN-Only Comparison
26
 
27
- | Model | Threshold | User Raw | Core PPSN | Edge PPSN | QA v8 PPSN | Irish Large PPSN |
28
- |---|---:|---:|---:|---:|---:|---:|
29
- | `release/OpenMed-mLiteClinical-IrishPPSN-135M-v1` | `0.40` | `0.8000` | `0.0800` | `0.4211` | `0.7385` | `0.8980` |
30
- | `models/openmed-mliteclinical-irish-core-v14_userboost_cls_s50` | `0.35` | `0.5000` | `0.9091` | `0.5000` | `0.7188` | `0.9384` |
31
- | `models/openmed-mliteclinical-irish-core-v15_weakctx_lora_s160` | `0.50` | `1.0000` | `0.8571` | `0.8571` | `0.7353` | `0.9403` |
32
 
33
- Reference files:
34
 
35
- - `reports/current_core_ppsnonly.json`
36
- - `reports/current_edge_ppsnonly.json`
37
- - `reports/v14_core_ppsnonly.json`
38
- - `reports/v14_edge_ppsnonly.json`
39
- - `reports/benchmark_user_v15_ppsnonly_t050.json`
40
- - `reports/benchmark_core_ppsn_v15_ppsnonly_t050.json`
41
- - `reports/benchmark_edge_ppsn_v15_ppsnonly_t050.json`
42
- - `reports/benchmark_v8_v15_ppsnonly_t050.json`
43
- - `reports/benchmark_large_v15_ppsnonly_t050.json`
44
 
45
- ## Multilabel Tradeoff
 
 
46
 
47
- At the recommended split thresholds (`--min-score 0.4 --ppsn-min-score 0.5`):
48
 
49
- - Irish core overall F1: `0.9487`
50
- - Irish edge overall F1: `0.8205`
51
- - PPSN on `eval/irish_core_pii_v1.jsonl`: precision `0.75`, recall `1.0`, F1 `0.8571`
52
- - PPSN on `eval/irish_ppsn_phone_edge_v1.jsonl`: precision `0.75`, recall `1.0`, F1 `0.8571`
53
- - phone number on `eval/irish_core_pii_v1.jsonl`: F1 `0.9167`
54
- - postcode on `eval/irish_core_pii_v1.jsonl`: F1 `0.7500`
55
-
56
- Compared with `v14`, this is the tradeoff:
57
 
58
- - better: weak-context PPSN recall and the reported `1234567T` / `1234567tw` failures
59
- - better: edge PPSN F1 (`0.8571` vs `0.5000`)
60
- - slightly worse: broad Irish-core multilabel F1 (`0.9487` vs `0.9677`)
61
- - slightly worse: phone/postcode retention in the small Irish core suite
62
 
63
- Reference files:
64
 
65
- - `reports/benchmark_user_v15_m040_p050.json`
66
- - `reports/benchmark_core_v15_m040_p050.json`
67
- - `reports/benchmark_edge_v15_m040_p050.json`
68
- - `reports/benchmark_v8_v15_m040_p050.json`
69
- - `reports/benchmark_large_v15_m040_p050.json`
70
- - `reports/tmp_core_v14_035.json`
71
- - `reports/tmp_edge_v14_035.json`
72
-
73
- ## Decision
74
 
75
- `v15` is the first raw model in this line that cleanly fixes the exact weak-context PPSN misses.
76
 
77
- It is a viable release candidate if weak-context PPSN reliability is now the priority.
78
 
79
- It is not strictly dominant over `v14`, because `v14` still holds a small advantage on the broader Irish-core multilabel suite.
 
1
+ # Benchmark Summary
2
 
3
+ This file summarizes the public comparison relevant for QA.
4
 
5
+ ## Baseline
 
 
 
 
6
 
7
+ Current public release:
8
 
9
+ - `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1`
10
 
11
+ Candidate under test:
 
 
 
 
 
 
12
 
13
+ - `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1`
14
 
15
  ## PPSN-Only Comparison
16
 
17
+ | Model | User Raw | Core PPSN | Edge PPSN | QA v8 PPSN | Irish Large PPSN |
18
+ |---|---:|---:|---:|---:|---:|
19
+ | `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1` | 0.8000 | 0.0800 | 0.4211 | 0.7385 | 0.8980 |
20
+ | `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1` | 1.0000 | 0.8571 | 0.8571 | 0.7353 | 0.9403 |
 
21
 
22
+ ## Exact Weak-Context PPSN Cases
23
 
24
+ At `--ppsn-min-score 0.5`, this RC detects:
 
 
 
 
 
 
 
 
25
 
26
+ - `1234567T - am I eligible for the housing grant?`
27
+ - `I was told to provide my number 1234567T when applying, what do I do next?`
28
+ - `My ppsn is 1234567tw and I need to know about carer's allowance`
29
 
30
+ And does not label these as PPSN:
31
 
32
+ - `123456T`
33
+ - `12345678T`
34
+ - `0871234567`
35
+ - `2024T`
 
 
 
 
36
 
37
+ ## Multilabel Snapshot
 
 
 
38
 
39
+ At `--ppsn-min-score 0.5 --other-min-score 0.4`:
40
 
41
+ - Irish core overall F1: `0.9487`
42
+ - Irish edge overall F1: `0.8205`
43
+ - `phone_number` core F1: `0.9167`
44
+ - `postcode` core F1: `0.7500`
 
 
 
 
 
45
 
46
+ ## QA Reading
47
 
48
+ This RC exists to improve weak-context PPSN reliability relative to the current public `v1` release.
49
 
50
+ QA should compare it directly against `temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1` on production-like Irish traffic.
eval/multilabel_summary.json CHANGED
@@ -7,13 +7,8 @@
7
  "overall_core_f1": 0.515,
8
  "overall_edge_f1": 0.2326
9
  },
10
- "previous_internal_best": {
11
- "name": "v14",
12
- "overall_core_f1": 0.9677419355,
13
- "overall_edge_f1": 0.8823529412
14
- },
15
  "this_rc": {
16
- "name": "v15",
17
  "overall_core_f1": 0.9487179487,
18
  "overall_edge_f1": 0.8205128205,
19
  "phone_core_f1": 0.9166666667,
 
7
  "overall_core_f1": 0.515,
8
  "overall_edge_f1": 0.2326
9
  },
 
 
 
 
 
10
  "this_rc": {
11
+ "name": "current release candidate",
12
  "overall_core_f1": 0.9487179487,
13
  "overall_edge_f1": 0.8205128205,
14
  "phone_core_f1": 0.9166666667,
eval/ppsn_only_summary.json CHANGED
@@ -8,16 +8,8 @@
8
  "v8_ppsn_f1": 0.7384615385,
9
  "irish_large_ppsn_f1": 0.898
10
  },
11
- "previous_internal_best": {
12
- "name": "v14",
13
- "user_raw_f1": 0.5,
14
- "core_ppsn_f1": 0.9090909091,
15
- "edge_ppsn_f1": 0.5,
16
- "v8_ppsn_f1": 0.71875,
17
- "irish_large_ppsn_f1": 0.9383658468
18
- },
19
  "this_rc": {
20
- "name": "v15",
21
  "user_raw_f1": 1.0,
22
  "core_ppsn_f1": 0.8571428571,
23
  "edge_ppsn_f1": 0.8571428571,
 
8
  "v8_ppsn_f1": 0.7384615385,
9
  "irish_large_ppsn_f1": 0.898
10
  },
 
 
 
 
 
 
 
 
11
  "this_rc": {
12
+ "name": "current release candidate",
13
  "user_raw_f1": 1.0,
14
  "core_ppsn_f1": 0.8571428571,
15
  "edge_ppsn_f1": 0.8571428571,
label_meta.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "base_model": "models/openmed-mliteclinical-irish-core-v14_userboost_cls_s50",
3
  "label_list": [
4
  "O",
5
  "B-account_number",
@@ -117,4 +117,4 @@
117
  "extra_labels": [
118
  "PPSN"
119
  ]
120
- }
 
1
  {
2
+ "base_model": "OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1",
3
  "label_list": [
4
  "O",
5
  "B-account_number",
 
117
  "extra_labels": [
118
  "PPSN"
119
  ]
120
+ }
training_sources.json CHANGED
@@ -1,47 +1,33 @@
1
  {
2
  "base_model": "OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1",
3
- "release_line": "IrishCorePII raw-model recovery candidate",
4
- "recovery_parent": "internal v14 candidate derived from the same base model",
5
- "recovery_adapter": "v15 weak-context PPSN recovery adapter",
6
  "recommended_thresholds": {
7
  "ppsn_min_score": 0.5,
8
  "other_min_score": 0.4
9
  },
10
- "mix": {
11
- "name": "ppsn_recover_v4_mix",
12
- "sources": [
13
- {
14
- "name": "user_raw_boost_v1",
15
- "weight": 7.0,
16
- "kind": "synthetic_eval_duplication"
17
- },
18
- {
19
- "name": "irish_ppsn_phone_edge_v1",
20
- "weight": 3.0,
21
- "kind": "synthetic_manual_suite"
22
- },
23
- {
24
- "name": "ppsns_focus_v7",
25
- "weight": 2.0,
26
- "kind": "synthetic_ppsn_focus"
27
- },
28
- {
29
- "name": "ppsns_mid_context_v6g",
30
- "weight": 2.0,
31
- "kind": "synthetic_ppsn_focus"
32
- },
33
- {
34
- "name": "irish_core_release_v2_mix",
35
- "weight": 5.0,
36
- "kind": "synthetic_replay_mix"
37
- },
38
- {
39
- "name": "irish_ppsn_eircode_spec_v1",
40
- "weight": 1.0,
41
- "kind": "synthetic_spec_dataset"
42
- }
43
- ]
44
- },
45
  "upstream_attribution": [
46
  {
47
  "name": "joelniklaus/mapa",
 
1
  {
2
  "base_model": "OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1",
3
+ "current_public_reference": "temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1",
4
+ "release_purpose": "Targeted weak-context PPSN recovery for the IrishCorePII release line.",
 
5
  "recommended_thresholds": {
6
  "ppsn_min_score": 0.5,
7
  "other_min_score": 0.4
8
  },
9
+ "training_mix_summary": [
10
+ {
11
+ "component": "duplicated weak-context PPSN regression cases",
12
+ "weight": 7.0
13
+ },
14
+ {
15
+ "component": "Irish PPSN and phone edge-case replay",
16
+ "weight": 3.0
17
+ },
18
+ {
19
+ "component": "synthetic PPSN focus data with weak-context positives and hard negatives",
20
+ "weight": 4.0
21
+ },
22
+ {
23
+ "component": "broader Irish core PII replay mix",
24
+ "weight": 5.0
25
+ },
26
+ {
27
+ "component": "spec-driven Irish PPSN and Eircode synthetic data",
28
+ "weight": 1.0
29
+ }
30
+ ],
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  "upstream_attribution": [
32
  {
33
  "name": "joelniklaus/mapa",