Benchmark Summary
This release now includes a bundled dynamic int8 ONNX export in onnx/model_quantized.onnx.
All numbers below use the same thresholds as the release recommendation:
ppsn_min_score = 0.5other_min_score = 0.35
Exact QA Suites
| Model | Numeric v2 | Passport | Routing | Phone | Gap | Passport | Routing | Phone |
|---|---|---|---|---|---|---|---|---|
v1 |
0.3000 | 0.5000 | 0.3333 | 0.0000 | 0.2667 | 0.3333 | 0.4000 | 0.0000 |
v2-rc1 |
0.2105 | 0.0000 | 0.0000 | 0.5000 | 0.1333 | 0.0000 | 0.0000 | 0.3333 |
v2-rc2 |
0.8966 | 0.9091 | 1.0000 | 0.7500 | 0.8696 | 0.8889 | 1.0000 | 0.6667 |
v2-rc2 ONNX q8 |
0.8387 | 0.7692 | 1.0000 | 0.7500 | 0.8333 | 0.8000 | 1.0000 | 0.6667 |
Broader CPU Benchmarks
| Model | User PPSN | Core | Edge | Multilingual PPSN | Core PPSN | Edge PPSN |
|---|---|---|---|---|---|---|
Base OpenMed |
0.0000 | 0.5409 | 0.0513 | 0.0000 | 0.0000 | 0.0000 |
v1 |
0.5000 | 0.9530 | 0.5714 | 0.9940 | 0.8000 | 0.5000 |
v2-rc1 |
1.0000 | 0.9487 | 0.8205 | 0.7568 | 0.8571 | 0.8571 |
v2-rc2 |
1.0000 | 0.9554 | 0.9500 | 0.8038 | 0.8571 | 0.8571 |
v2-rc2 ONNX q8 |
1.0000 | 0.9615 | 0.9500 | 0.7887 | 0.8571 | 0.8571 |
CPU Throughput
| Model | Core ex/s | Edge ex/s | Multilingual PPSN ex/s |
|---|---|---|---|
Base OpenMed |
33.5507 | 30.5715 | 116.8383 |
v1 |
27.1865 | 29.2080 | 127.0334 |
v2-rc1 |
27.6197 | 28.9968 | 111.3383 |
v2-rc2 |
27.4796 | 30.1890 | 131.0956 |
v2-rc2 ONNX q8 |
119.0614 | 69.9669 | 86.9493 |
Reading These Numbers
- The bundled ONNX q8 artifact preserves the release's
user_raw_regression_cases_v1PPSN score (1.0000) and the edge-suite overall score (0.9500). - On the exact QA suites, ONNX q8 stays strong on bank routing (
1.0000) and phone (0.7500/0.6667), but it is weaker than the full checkpoint on passport boundary cases (0.7692/0.8000vs0.9091/0.8889). - On the broader English/Irish core suite, ONNX q8 is slightly better overall (
0.9615vs0.9554) at the same thresholds. - The main quality cost of ONNX q8 in this release line is multilingual PPSN precision:
0.7887vs0.8038for the full checkpoint, and well below the publicv1PPSN-focused release (0.9940).