temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1

Full transformers checkpoint derived from OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1 and tuned for Irish core PII:

  • PPSN
  • account_number
  • bank_routing_number
  • credit_debit_card
  • PASSPORT_NUMBER
  • postcode
  • phone_number
  • email
  • first_name
  • last_name
  • swift_bic

The main focus is English + Irish Gaelic (ga) handling for Irish administrative, citizen-support, and HSE-style text.

Included Artifacts

  • Full transformers model files in the repo root
  • Dynamic int8 ONNX export in onnx/model_quantized.onnx
  • inference_mask.py for the full model
  • inference_mask_onnx.py for the ONNX int8 artifact
  • clean benchmark summaries in eval/

Recommended Inference

Highest accuracy:

python3 inference_mask.py \
  --text "My PPSN is 1234567TW and call me on 087 123 4567." \
  --json

Fast CPU path:

python3 inference_mask_onnx.py \
  --text "My PPSN is 1234567TW and call me on 087 123 4567." \
  --json

Dedicated Eircode example:

python3 inference_mask.py \
  --text "My Eircode is D02 X285." \
  --json

Benchmarks

Reference comparison on the manual Irish core suite and PPSN regression suites:

Label Base OpenMed Previous Public Model This Release ONNX Q8
PPSN 0.0000 0.0800 0.8000 0.7273
account_number 0.3333 0.3333 1.0000 1.0000
bank_routing_number 0.0000 0.0000 1.0000 1.0000
credit_debit_card 0.1538 0.1818 1.0000 0.3333
PASSPORT_NUMBER 0.0000 0.0000 1.0000 1.0000
postcode 0.0000 0.0000 1.0000 1.0000
phone_number 0.0000 0.0000 0.8571 0.8571
email 0.7059 1.0000 1.0000 1.0000
first_name 0.8947 0.8947 1.0000 1.0000
last_name 0.8889 0.8889 1.0000 1.0000
swift_bic 0.0000 0.0000 1.0000 1.0000

Edge and multilingual PPSN checks:

Suite Base OpenMed Previous Public Model This Release ONNX Q8
edge_ppsn 0.0000 0.4211 0.5000 0.4000
edge_phone_number 0.1429 0.1429 0.6316 0.5000
multilingual_ppsn 0.0000 0.9704 0.9940 0.9882

Multilingual PPSN throughput on CPU (eval/multilingual_ppsn_v1_all.jsonl):

  • Base OpenMed: 42.30 examples/s
  • Previous public PPSN model: 42.63 examples/s
  • This release: 41.18 examples/s
  • ONNX Q8: 81.99 examples/s

Practical Reading Of The Benchmarks

  • This release is materially better than the previous public PPSN-only model on Irish phones, Eircodes, account details, passport numbers, and names.
  • The bundled ONNX int8 export is useful for CPU speed, but it is not accuracy-identical to the full checkpoint.
  • The largest ONNX drops are on credit_debit_card and some PPSN edge cases. Use the full model when those matter.

License And Attribution

  • Model weights in this repo are distributed under Apache-2.0.
  • Base model: OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1
  • Training data included synthetic Irish data plus attributed upstream data from:
    • joelniklaus/mapa (cc-by-4.0)
    • gretelai/synthetic_pii_finance_multilingual (apache-2.0)
  • See NOTICE for attribution details.
Downloads last month
14
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1

Datasets used to train temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1

Evaluation results