Surya CoreML Runtime Status

Generated on 2026-06-19 in studio@100.102.185.54:~/datalab-quants-cairo.

Packages

artifacts/coreml/surya-ocr-2-coreml-8bit/surya_vision_fp16.mlpackage
artifacts/coreml/surya-ocr-2-coreml-8bit/surya_vision_int8.mlpackage
artifacts/coreml/surya-ocr-2-coreml-8bit/surya_prefill_fp16_seq300_cache512.mlpackage
artifacts/coreml/surya-ocr-2-coreml-8bit/surya_decode_step_fp16_cache512.mlpackage

Prefill parity before CoreML export:
- prefill_parity.json
- native/custom first token: 1039
- logits max diff: 2.6702880859375e-05
- logits mean diff: 8.332146421707876e-07
Prefill CoreML smoke:
- prefill_coreml_smoke.json
- Torch/CoreML first token: 1039
- logits max diff: 0.3057253360748291
- logits mean diff: 0.03853870555758476
Decode CoreML iterative smoke with advancing native cache:
- decode_step_iterative_smoke_fixed_native_cache.json
- tokens match 9/9
- text: <p>Invoice
Combined language runtime smoke:
- combined_prefill_decode_smoke.json
- CoreML prefill cache -> CoreML decode step
- tokens match 9/9
- text: <p>Invoice
Vision-inclusive runtime smoke, FP16 vision:
- vision_fp16_prefill_decode_smoke.json
- CoreML vision -> CoreML prefill -> CoreML decode
- tokens match 9/9
- vision mean diff vs torch: 0.019221976399421692
Vision-inclusive runtime smoke, INT8 vision:
- vision_int8_prefill_decode_smoke.json
- CoreML INT8 vision -> CoreML prefill -> CoreML decode
- tokens match 9/9
- vision mean diff vs torch: 0.021211756393313408

Prefill export uses skip_model_load=True and compute_units=CPU_ONLY during ct.convert to avoid CoreMLTools eagerly compiling the large MLProgram through ANE before saving.
Runtime smokes still instantiate .mlpackage files with CPU_ONLY and run real predictions.
The current prefill package is fixed to the canary prompt sequence length 300 and full-attention cache length 512.