Benchmark Summary
This file summarizes the public comparison relevant for QA.
Baseline
Current public release:
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1
Candidate under test:
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1
PPSN-Only Comparison
| Model | User Raw | Core PPSN | Edge PPSN | QA v8 PPSN | Irish Large PPSN |
|---|---|---|---|---|---|
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1 |
0.8000 | 0.0800 | 0.4211 | 0.7385 | 0.8980 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1 |
1.0000 | 0.8571 | 0.8571 | 0.7353 | 0.9403 |
Exact Weak-Context PPSN Cases
At --ppsn-min-score 0.5, this RC detects:
1234567T - am I eligible for the housing grant?I was told to provide my number 1234567T when applying, what do I do next?My ppsn is 1234567tw and I need to know about carer's allowance
And does not label these as PPSN:
123456T12345678T08712345672024T
Multilabel Snapshot
At --ppsn-min-score 0.5 --other-min-score 0.4:
- Irish core overall F1:
0.9487 - Irish edge overall F1:
0.8205 phone_numbercore F1:0.9167postcodecore F1:0.7500
ONNX Deployment Benchmark
These runtime numbers compare the previous float ONNX export with the dynamic 8-bit ONNX
artifact now published at onnx/model.onnx for this RC.
| Artifact | Quantization | Size (MB) | Avg Latency (ms) | P95 Latency (ms) | Throughput (RPS) | CPU ms / req |
|---|---|---|---|---|---|---|
| previous ONNX export | float32 | 517.19 | 46.44 | 141.74 | 21.53 | 235.22 |
published onnx/model.onnx |
dynamic 8-bit (QUInt8, per-tensor) |
128.94 | 32.10 | 106.13 | 31.14 | 169.75 |
Runtime notes:
- raw entity spans differ slightly from the float export on the synthetic runtime corpus
- final endpoint redacted text matched on the smoke sample used for release validation
- a signed
QInt8dynamic candidate was rejected because it degraded PPSN span fidelity more than the published artifact
QA Reading
This RC exists to improve weak-context PPSN reliability relative to the current public v1 release.
QA should compare it directly against temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1 on production-like Irish traffic.