Nemotron OCR v2 β€” CoreML (Apple Neural Engine)

CoreML-format models for NVIDIA Nemotron OCR v2, exported for native Apple Silicon inference targeting the Apple Neural Engine (ANE).

Model Details

Three-stage OCR pipeline optimized for ANE acceleration:

Component Format Precision Size
Detector .mlmodel float32 ~181 MB
Recognizer .mlpackage float16 ~12–72 MB
Relational .mlpackage float16 ~5 MB

Variants

Variant Charset Detector Resolution Batch Size
v2_english 855 characters 1024Γ—1024 16 regions
v2_multilingual ~42K characters 1024Γ—1024 16 regions

ANE Performance

On Apple Silicon, the CoreML ANE path provides the lowest latency:

Backend Typical Latency Device
CoreML ANE ~350ms Apple Neural Engine
MLX GPU ~2800ms Metal GPU
MLX CPU ~6700ms CPU

Usage with SwiftNemotronOCR

This model is designed for use with SwiftNemotronOCR, an all-Swift OCR pipeline.

Quick Start

# Clone the Swift package
git clone https://github.com/mweinbach/SwiftNemotronOCR.git
cd SwiftNemotronOCR

# Build
swift build -c release

# Download this model
# Place v2_english/ and/or v2_multilingual/ under a model/coreml/ directory

# Run OCR on ANE
.build/release/apple-ocr-runner \
  --runtime coreml \
  --device ANE \
  --model-dir /path/to/model/coreml \
  --image /path/to/image.png \
  --variant en \
  --level paragraph

Supported Devices

--device ANE    # Apple Neural Engine (fastest, default)
--device CPU    # CPU only
--device GPU    # CPU + GPU
--device ALL    # All available compute units

Output

JSON with detected text regions, confidence scores, and bounding quads:

{
  "images": [{
    "region_count": 38,
    "latency_ms": 359.2,
    "regions": [
      {"text": "Council", "confidence": 0.48, "quad": {"points": [...]}},
      {"text": "BANK STATEMENT", "confidence": 0.55, "quad": {"points": [...]}}
    ]
  }]
}

File Structure

v2_english/
  manifest.json                        # Model metadata + ANE optimization config
  charset.txt                          # Character vocabulary (JSON array)
  v2_english_detector.mlmodel          # RegNet-X-8GF detector (~181 MB)
  v2_english_recognizer.mlpackage/     # Transformer recognizer
  v2_english_relational.mlpackage/     # GNN relational model

v2_multilingual/
  manifest.json
  charset.txt
  v2_multilingual_detector.mlmodel
  v2_multilingual_recognizer.mlpackage/
  v2_multilingual_relational.mlpackage/

ANE Optimization Details

From manifest.json:

  • Detector: NCHW input layout, 1024Γ—1024 fixed resolution, float32
  • Recognizer: NCHW input layout, batch size 16, float16
  • Relational: Max 256 regions, float16
  • Host pipeline: Preprocessing, NMS, quad rectification, and batching run on CPU

Related

License

This model inherits the NVIDIA Open Model License from the original Nemotron OCR v2 release.

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mweinbach/nemotron-ocr-v2-coreml

Quantized
(1)
this model