Nemotron OCR v2 — CoreML (Apple Neural Engine)

CoreML-format models for NVIDIA Nemotron OCR v2, exported for native Apple Silicon inference targeting the Apple Neural Engine (ANE).

Model Details

Three-stage OCR pipeline optimized for ANE acceleration:

Component	Format	Precision	Size
Detector	`.mlmodel`	float32	~181 MB
Recognizer	`.mlpackage`	float16	~12–72 MB
Relational	`.mlpackage`	float16	~5 MB

Variants

Variant	Charset	Detector Resolution	Batch Size
`v2_english`	855 characters	1024×1024	16 regions
`v2_multilingual`	~42K characters	1024×1024	16 regions

ANE Performance

On Apple Silicon, the CoreML ANE path provides the lowest latency:

Backend	Typical Latency	Device
CoreML ANE	~350ms	Apple Neural Engine
MLX GPU	~2800ms	Metal GPU
MLX CPU	~6700ms	CPU

Usage with SwiftNemotronOCR

This model is designed for use with SwiftNemotronOCR, an all-Swift OCR pipeline.

Quick Start

# Clone the Swift package
git clone https://github.com/mweinbach/SwiftNemotronOCR.git
cd SwiftNemotronOCR

# Build
swift build -c release

# Download this model
# Place v2_english/ and/or v2_multilingual/ under a model/coreml/ directory

# Run OCR on ANE
.build/release/apple-ocr-runner \
  --runtime coreml \
  --device ANE \
  --model-dir /path/to/model/coreml \
  --image /path/to/image.png \
  --variant en \
  --level paragraph

Supported Devices

--device ANE    # Apple Neural Engine (fastest, default)
--device CPU    # CPU only
--device GPU    # CPU + GPU
--device ALL    # All available compute units

Output

JSON with detected text regions, confidence scores, and bounding quads:

{
  "images": [{
    "region_count": 38,
    "latency_ms": 359.2,
    "regions": [
      {"text": "Council", "confidence": 0.48, "quad": {"points": [...]}},
      {"text": "BANK STATEMENT", "confidence": 0.55, "quad": {"points": [...]}}
    ]
  }]
}

File Structure

v2_english/
  manifest.json                        # Model metadata + ANE optimization config
  charset.txt                          # Character vocabulary (JSON array)
  v2_english_detector.mlmodel          # RegNet-X-8GF detector (~181 MB)
  v2_english_recognizer.mlpackage/     # Transformer recognizer
  v2_english_relational.mlpackage/     # GNN relational model

v2_multilingual/
  manifest.json
  charset.txt
  v2_multilingual_detector.mlmodel
  v2_multilingual_recognizer.mlpackage/
  v2_multilingual_relational.mlpackage/

ANE Optimization Details

From manifest.json:

Detector: NCHW input layout, 1024×1024 fixed resolution, float32
Recognizer: NCHW input layout, batch size 16, float16
Relational: Max 256 regions, float16
Host pipeline: Preprocessing, NMS, quad rectification, and batching run on CPU

Swift Package: mweinbach/SwiftNemotronOCR — Full Swift implementation with CoreML + MLX backends
MLX Weights: mweinbach/nemotron-ocr-v2-mlx — MLX-format weights for CPU/GPU inference
Original Model: nvidia/nemotron-ocr-v2 — NVIDIA's original PyTorch release

License

This model inherits the NVIDIA Open Model License from the original Nemotron OCR v2 release.

Downloads last month: 36

Model tree for mweinbach/nemotron-ocr-v2-coreml

Base model

nvidia/nemotron-ocr-v2

Quantized

(3)

this model

mweinbach
/

nemotron-ocr-v2-coreml