Benchmark Summary

This release now includes a bundled dynamic int8 ONNX export in onnx/model_quantized.onnx.

All numbers below use the same thresholds as the release recommendation:

Exact QA Suites

Model	Numeric v2	Passport	Routing	Phone	Gap	Passport	Routing	Phone
`v1`	0.3000	0.5000	0.3333	0.0000	0.2667	0.3333	0.4000	0.0000
`v2-rc1`	0.2105	0.0000	0.0000	0.5000	0.1333	0.0000	0.0000	0.3333
`v2-rc2`	0.8966	0.9091	1.0000	0.7500	0.8696	0.8889	1.0000	0.6667
`v2-rc2 ONNX q8`	0.8387	0.7692	1.0000	0.7500	0.8333	0.8000	1.0000	0.6667

Model	User PPSN	Core	Edge	Multilingual PPSN	Core PPSN	Edge PPSN
`Base OpenMed`	0.0000	0.5409	0.0513	0.0000	0.0000	0.0000
`v1`	0.5000	0.9530	0.5714	0.9940	0.8000	0.5000
`v2-rc1`	1.0000	0.9487	0.8205	0.7568	0.8571	0.8571
`v2-rc2`	1.0000	0.9554	0.9500	0.8038	0.8571	0.8571
`v2-rc2 ONNX q8`	1.0000	0.9615	0.9500	0.7887	0.8571	0.8571

Model	Core ex/s	Edge ex/s	Multilingual PPSN ex/s
`Base OpenMed`	33.5507	30.5715	116.8383
`v1`	27.1865	29.2080	127.0334
`v2-rc1`	27.6197	28.9968	111.3383
`v2-rc2`	27.4796	30.1890	131.0956
`v2-rc2 ONNX q8`	119.0614	69.9669	86.9493

The bundled ONNX q8 artifact preserves the release's user_raw_regression_cases_v1 PPSN score (1.0000) and the edge-suite overall score (0.9500).
On the exact QA suites, ONNX q8 stays strong on bank routing (1.0000) and phone (0.7500 / 0.6667), but it is weaker than the full checkpoint on passport boundary cases (0.7692 / 0.8000 vs 0.9091 / 0.8889).
On the broader English/Irish core suite, ONNX q8 is slightly better overall (0.9615 vs 0.9554) at the same thresholds.
The main quality cost of ONNX q8 in this release line is multilingual PPSN precision: 0.7887 vs 0.8038 for the full checkpoint, and well below the public v1 PPSN-focused release (0.9940).