IrishCore-DiffMask-135M-v1-rc3 / eval /benchmark_summary.md
temsa's picture
Add files using upload-large-folder tool
85b0a00 verified

Benchmark Summary

ONNX q8

Suite F1 Examples/s
Irish core 0.9664 29.9676
Edge 1.0000 116.3977
Finance 1.0000 27.2093
Finance boundary 1.0000 64.7358
User PPSN 0.8571 47.0000
GA weak PPSN 1.0000 52.5895
Multilingual PPSN 0.9591 54.2219
Hardening exact 1.0000 64.5117
UAT replay exact 0.9032 31.7159

Full checkpoint

Suite F1 Examples/s
Irish core 0.9664 25.9375
Edge 1.0000 35.0893
Multilingual PPSN 0.9591 60.4876
Hardening exact 1.0000 27.8705

UAT Replay Exact Comparison

Model F1 Precision Recall Examples/s
IrishCore-DiffMask-135M-v1-rc1 q8 0.4545 1.0000 0.2941 238.6524
IrishCore-DiffMask-135M-v1-rc2 q8 0.8276 1.0000 0.7059 183.6675
OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8 q8 0.3636 0.3750 0.3529 110.7595
IrishCore-DiffMask-135M-v1-rc3 q8 0.9032 1.0000 0.8235 31.7159

Notes

  • rc3 keeps the stronger focusv3 checkpoint and applies a narrower published decoder profile for email continuation and q8 passport recovery.
  • The remaining known misses on the UAT replay suite are 071 967 2616, R93 EC57 inside a longer centre block, and EPStamp4@enterprise.gov.ie.
  • user_raw_regression_cases_v1 is a legacy PPSN-only suite; its counted false positive is 0871234567, which rc3 now masks as PHONE_NUMBER.