IrishCore-DiffMask-135M-v1-rc2 / eval /benchmark_summary.md
temsa's picture
Add files using upload-large-folder tool
ed10267 verified

Benchmark Summary

ONNX q8

Suite F1 Examples/s
Irish core 0.9664 247.0809
Edge 1.0000 247.8374
Finance 1.0000 260.5229
Finance boundary 1.0000 111.3480
User PPSN 1.0000 240.0219
GA weak PPSN 1.0000 121.8613
Multilingual PPSN 0.9212 256.1316
Hardening exact 0.9744 231.8666
UAT replay exact 0.8276 183.6675

Full checkpoint

Suite F1 Examples/s
Irish core 0.9664 47.2794
Edge 1.0000 38.1395
Multilingual PPSN 0.9212 65.8959
Hardening exact 0.9744 31.0518

UAT Replay Exact Comparison

Model F1 Precision Recall Examples/s
IrishCore-DiffMask-135M-v1-rc1 q8 0.4545 1.0000 0.2941 238.6524
OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8 q8 0.3636 0.3750 0.3529 110.7595
IrishCore-DiffMask-135M-v1-rc2 q8 0.8276 1.0000 0.7059 183.6675

Notes

  • rc2 was selected from an interpolation blend after cleaning label contamination in the v5 training mix.
  • The remaining known misses on the UAT replay suite are 071 967 2616, R93 EC57 inside a longer centre block, EPStamp4@enterprise.gov.ie, and one D02 XY45 form.