temsa's picture
Update benchmark summary with dynamic 8-bit ONNX results
38c2884 verified

Benchmark Summary

This file summarizes the public comparison relevant for QA.

Baseline

Current public release:

  • temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1

Candidate under test:

  • temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1

PPSN-Only Comparison

Model User Raw Core PPSN Edge PPSN QA v8 PPSN Irish Large PPSN
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1 0.8000 0.0800 0.4211 0.7385 0.8980
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1 1.0000 0.8571 0.8571 0.7353 0.9403

Exact Weak-Context PPSN Cases

At --ppsn-min-score 0.5, this RC detects:

  • 1234567T - am I eligible for the housing grant?
  • I was told to provide my number 1234567T when applying, what do I do next?
  • My ppsn is 1234567tw and I need to know about carer's allowance

And does not label these as PPSN:

  • 123456T
  • 12345678T
  • 0871234567
  • 2024T

Multilabel Snapshot

At --ppsn-min-score 0.5 --other-min-score 0.4:

  • Irish core overall F1: 0.9487
  • Irish edge overall F1: 0.8205
  • phone_number core F1: 0.9167
  • postcode core F1: 0.7500

ONNX Deployment Benchmark

These runtime numbers compare the previous float ONNX export with the dynamic 8-bit ONNX artifact now published at onnx/model.onnx for this RC.

Artifact Quantization Size (MB) Avg Latency (ms) P95 Latency (ms) Throughput (RPS) CPU ms / req
previous ONNX export float32 517.19 46.44 141.74 21.53 235.22
published onnx/model.onnx dynamic 8-bit (QUInt8, per-tensor) 128.94 32.10 106.13 31.14 169.75

Runtime notes:

  • raw entity spans differ slightly from the float export on the synthetic runtime corpus
  • final endpoint redacted text matched on the smoke sample used for release validation
  • a signed QInt8 dynamic candidate was rejected because it degraded PPSN span fidelity more than the published artifact

QA Reading

This RC exists to improve weak-context PPSN reliability relative to the current public v1 release.

QA should compare it directly against temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1 on production-like Irish traffic.