temsa's picture
Add ONNX dynamic-int8 artifact and benchmark summary
b10bf6a verified

Benchmark Summary

This release now includes a bundled dynamic int8 ONNX export in onnx/model_quantized.onnx.

All numbers below use the same thresholds as the release recommendation:

  • ppsn_min_score = 0.5
  • other_min_score = 0.35

Exact QA Suites

Model Numeric v2 Passport Routing Phone Gap Passport Routing Phone
v1 0.3000 0.5000 0.3333 0.0000 0.2667 0.3333 0.4000 0.0000
v2-rc1 0.2105 0.0000 0.0000 0.5000 0.1333 0.0000 0.0000 0.3333
v2-rc2 0.8966 0.9091 1.0000 0.7500 0.8696 0.8889 1.0000 0.6667
v2-rc2 ONNX q8 0.8387 0.7692 1.0000 0.7500 0.8333 0.8000 1.0000 0.6667

Broader CPU Benchmarks

Model User PPSN Core Edge Multilingual PPSN Core PPSN Edge PPSN
Base OpenMed 0.0000 0.5409 0.0513 0.0000 0.0000 0.0000
v1 0.5000 0.9530 0.5714 0.9940 0.8000 0.5000
v2-rc1 1.0000 0.9487 0.8205 0.7568 0.8571 0.8571
v2-rc2 1.0000 0.9554 0.9500 0.8038 0.8571 0.8571
v2-rc2 ONNX q8 1.0000 0.9615 0.9500 0.7887 0.8571 0.8571

CPU Throughput

Model Core ex/s Edge ex/s Multilingual PPSN ex/s
Base OpenMed 33.5507 30.5715 116.8383
v1 27.1865 29.2080 127.0334
v2-rc1 27.6197 28.9968 111.3383
v2-rc2 27.4796 30.1890 131.0956
v2-rc2 ONNX q8 119.0614 69.9669 86.9493

Reading These Numbers

  • The bundled ONNX q8 artifact preserves the release's user_raw_regression_cases_v1 PPSN score (1.0000) and the edge-suite overall score (0.9500).
  • On the exact QA suites, ONNX q8 stays strong on bank routing (1.0000) and phone (0.7500 / 0.6667), but it is weaker than the full checkpoint on passport boundary cases (0.7692 / 0.8000 vs 0.9091 / 0.8889).
  • On the broader English/Irish core suite, ONNX q8 is slightly better overall (0.9615 vs 0.9554) at the same thresholds.
  • The main quality cost of ONNX q8 in this release line is multilingual PPSN precision: 0.7887 vs 0.8038 for the full checkpoint, and well below the public v1 PPSN-focused release (0.9940).