Nemotron OCR v2 β CoreML (Apple Neural Engine)
CoreML-format models for NVIDIA Nemotron OCR v2, exported for native Apple Silicon inference targeting the Apple Neural Engine (ANE).
Model Details
Three-stage OCR pipeline optimized for ANE acceleration:
| Component | Format | Precision | Size |
|---|---|---|---|
| Detector | .mlmodel |
float32 | ~181 MB |
| Recognizer | .mlpackage |
float16 | ~12β72 MB |
| Relational | .mlpackage |
float16 | ~5 MB |
Variants
| Variant | Charset | Detector Resolution | Batch Size |
|---|---|---|---|
v2_english |
855 characters | 1024Γ1024 | 16 regions |
v2_multilingual |
~42K characters | 1024Γ1024 | 16 regions |
ANE Performance
On Apple Silicon, the CoreML ANE path provides the lowest latency:
| Backend | Typical Latency | Device |
|---|---|---|
| CoreML ANE | ~350ms | Apple Neural Engine |
| MLX GPU | ~2800ms | Metal GPU |
| MLX CPU | ~6700ms | CPU |
Usage with SwiftNemotronOCR
This model is designed for use with SwiftNemotronOCR, an all-Swift OCR pipeline.
Quick Start
# Clone the Swift package
git clone https://github.com/mweinbach/SwiftNemotronOCR.git
cd SwiftNemotronOCR
# Build
swift build -c release
# Download this model
# Place v2_english/ and/or v2_multilingual/ under a model/coreml/ directory
# Run OCR on ANE
.build/release/apple-ocr-runner \
--runtime coreml \
--device ANE \
--model-dir /path/to/model/coreml \
--image /path/to/image.png \
--variant en \
--level paragraph
Supported Devices
--device ANE # Apple Neural Engine (fastest, default)
--device CPU # CPU only
--device GPU # CPU + GPU
--device ALL # All available compute units
Output
JSON with detected text regions, confidence scores, and bounding quads:
{
"images": [{
"region_count": 38,
"latency_ms": 359.2,
"regions": [
{"text": "Council", "confidence": 0.48, "quad": {"points": [...]}},
{"text": "BANK STATEMENT", "confidence": 0.55, "quad": {"points": [...]}}
]
}]
}
File Structure
v2_english/
manifest.json # Model metadata + ANE optimization config
charset.txt # Character vocabulary (JSON array)
v2_english_detector.mlmodel # RegNet-X-8GF detector (~181 MB)
v2_english_recognizer.mlpackage/ # Transformer recognizer
v2_english_relational.mlpackage/ # GNN relational model
v2_multilingual/
manifest.json
charset.txt
v2_multilingual_detector.mlmodel
v2_multilingual_recognizer.mlpackage/
v2_multilingual_relational.mlpackage/
ANE Optimization Details
From manifest.json:
- Detector: NCHW input layout, 1024Γ1024 fixed resolution, float32
- Recognizer: NCHW input layout, batch size 16, float16
- Relational: Max 256 regions, float16
- Host pipeline: Preprocessing, NMS, quad rectification, and batching run on CPU
Related
- Swift Package: mweinbach/SwiftNemotronOCR β Full Swift implementation with CoreML + MLX backends
- MLX Weights: mweinbach/nemotron-ocr-v2-mlx β MLX-format weights for CPU/GPU inference
- Original Model: nvidia/nemotron-ocr-v2 β NVIDIA's original PyTorch release
License
This model inherits the NVIDIA Open Model License from the original Nemotron OCR v2 release.
- Downloads last month
- 25
Model tree for mweinbach/nemotron-ocr-v2-coreml
Base model
nvidia/nemotron-ocr-v2