Nemotron OCR v2 CoreML
CoreML conversion of the English neural stages from NVIDIA Nemotron OCR v2.
SwiftPM package: github.com/mweinbach/OCRCoreML
Included Models
| stage | file | input | outputs |
|---|---|---|---|
| detector | DetectorGPUInt8_768.mlpackage |
image: Float32[1, 3, 768, 768] |
prob, rboxes, features |
| recognizer | RecognizerFeaturesInt8.mlpackage |
regions: Float32[128, 128, 8, 32] |
logits, features |
| relational | RelationalInt8.mlpackage |
rectified regions, original quads, recognizer features, valid count | words, lines, line_log_var |
The recognizer emits transformer features; those are required by the
relational model, so this bundle covers the full neural OCR pipeline rather
than detector-only inference.
Pipeline Boundary
The original Python package uses CUDA/C++ helpers for the non-neural stages:
rotated-box NMS, rboxes to quads, quad rectification, feature-map grid
sampling, sequence decoding, relation-graph decoding, and reading-order
formatting. Those operations are not CoreML models. Apple apps integrating this
bundle must port or replace those post-processing steps.
The linked SwiftPM package includes wrappers for all three CoreML models and a greedy recognizer decoder. It exposes raw tensors rather than claiming complete image-to-text OCR until the geometric and graph post-processing is ported.
Files
| file | purpose |
|---|---|
DetectorGPUInt8_768.mlpackage/ |
detector CoreML package |
RecognizerFeaturesInt8.mlpackage/ |
recognizer CoreML package with logits and features |
RelationalInt8.mlpackage/ |
relational CoreML package |
charset.txt |
English checkpoint charset |
model_config.json |
English checkpoint config |
configs/ |
conversion configs used for the three packages |
benchmarks/ |
local CoreML benchmark results |
parity/ |
PyTorch-vs-CoreML parity reports |
checksums.sha256 |
SHA-256 checksums for package files |
LICENSE, NOTICE |
license terms and redistribution notice |
Performance
Local median latencies after warmup:
| stage | GPU/ALL median | CPU+NE median | CPU median |
|---|---|---|---|
| detector | 10.65 ms | 50.46 ms | 157.71 ms |
| recognizer + features | 4.53 ms | 11.04 ms | 47.58 ms |
| relational | 1.72 ms | 6.38 ms | 34.53 ms |
GPU/CoreML ALL is the best single-shot latency path on the test machine.
CPU+ANE is useful when GPU time needs to be reserved for rendering or other
workloads.
Swift Usage
import OCRCoreML
let pipeline = try OCRPipeline(computeUnits: .cpuAndGPU)
let detectorPrediction = try pipeline.detect(image: cgImage)
let recognizerPrediction = try pipeline.recognize(regions: regions)
let decoded = try pipeline.recognizer.decode(
logits: recognizerPrediction.output.logits,
count: detectedRegionCount
)
let relationalPrediction = try pipeline.relate(
rectifiedQuads: relationalRegionFeatures,
originalQuads: originalQuads,
recognizerFeatures: recognizerPrediction.output.features,
numValid: detectedRegionCount
)
See the SwiftPM docs for exact app integration notes: https://github.com/mweinbach/OCRCoreML
License
The converted model weights inherit the
NVIDIA Open Model License Agreement.
The upstream source code and helper scripts are Apache 2.0. See LICENSE and
NOTICE.
- Downloads last month
- 57
Model tree for mweinbach1/nemotron-ocr-v2-coreml
Base model
nvidia/nemotron-ocr-v2