surya-ocr-2-coreml-runtime / validation /coreml_runtime_status.md
Reza2kn's picture
Upload Surya OCR 2 CoreML runtime canary
92c0d8d verified
|
Raw
History Blame Contribute Delete
2.27 kB

Surya CoreML Runtime Status

Generated on 2026-06-19 in studio@100.102.185.54:~/datalab-quants-cairo.

Packages

  • artifacts/coreml/surya-ocr-2-coreml-8bit/surya_vision_fp16.mlpackage
  • artifacts/coreml/surya-ocr-2-coreml-8bit/surya_vision_int8.mlpackage
  • artifacts/coreml/surya-ocr-2-coreml-8bit/surya_prefill_fp16_seq300_cache512.mlpackage
  • artifacts/coreml/surya-ocr-2-coreml-8bit/surya_decode_step_fp16_cache512.mlpackage

Passing Runtime Gates

  • Prefill parity before CoreML export:
    • prefill_parity.json
    • native/custom first token: 1039
    • logits max diff: 2.6702880859375e-05
    • logits mean diff: 8.332146421707876e-07
  • Prefill CoreML smoke:
    • prefill_coreml_smoke.json
    • Torch/CoreML first token: 1039
    • logits max diff: 0.3057253360748291
    • logits mean diff: 0.03853870555758476
  • Decode CoreML iterative smoke with advancing native cache:
    • decode_step_iterative_smoke_fixed_native_cache.json
    • tokens match 9/9
    • text: <p>Invoice
  • Combined language runtime smoke:
    • combined_prefill_decode_smoke.json
    • CoreML prefill cache -> CoreML decode step
    • tokens match 9/9
    • text: <p>Invoice
  • Vision-inclusive runtime smoke, FP16 vision:
    • vision_fp16_prefill_decode_smoke.json
    • CoreML vision -> CoreML prefill -> CoreML decode
    • tokens match 9/9
    • vision mean diff vs torch: 0.019221976399421692
  • Vision-inclusive runtime smoke, INT8 vision:
    • vision_int8_prefill_decode_smoke.json
    • CoreML INT8 vision -> CoreML prefill -> CoreML decode
    • tokens match 9/9
    • vision mean diff vs torch: 0.021211756393313408

Current Host Responsibilities

  • Tokenization.
  • Initial text token embedding lookup.
  • Image placeholder insertion.
  • Rotary position embedding generation.
  • Generated-token embedding lookup.
  • Full-attention KV cache insertion.

Export Notes

  • Prefill export uses skip_model_load=True and compute_units=CPU_ONLY during ct.convert to avoid CoreMLTools eagerly compiling the large MLProgram through ANE before saving.
  • Runtime smokes still instantiate .mlpackage files with CPU_ONLY and run real predictions.
  • The current prefill package is fixed to the canary prompt sequence length 300 and full-attention cache length 512.