metadata
license: other
license_name: nvidia-open-model-license
license_link: >-
https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
library_name: mlx
tags:
- mlx
- ocr
- apple-silicon
- swift
- nemotron
- nvidia
- document-understanding
- text-recognition
pipeline_tag: image-to-text
base_model: nvidia/nemotron-ocr-v2
Nemotron OCR v2 — MLX (Apple Silicon)
MLX-format weights for NVIDIA Nemotron OCR v2, converted from PyTorch for native Apple Silicon inference via mlx-swift.
Model Details
Three-stage OCR pipeline optimized for Apple Silicon CPU and GPU:
| Component | Architecture | Parameters |
|---|---|---|
| Detector | RegNet-X-8GF + ASPP + FPN | ~43M |
| Recognizer | CNN encoder + Transformer decoder | ~6M |
| Relational | Graph neural network + Transformer | ~2M |
Variants
| Variant | Charset | Vocab Size | Recognizer Seq Length |
|---|---|---|---|
v2_english |
855 characters | 858 tokens | 32 |
v2_multilingual |
~42K characters | ~42K tokens | 512 |
Device-Tuned Presets
| Device | Detector Resolution | dtype | Notes |
|---|---|---|---|
| GPU | 512×512 | bfloat16 | Best throughput on Apple Silicon GPU via Metal |
| CPU | 256×256 | float32 | Reduced resolution avoids expensive full-res CPU inference |
Usage with SwiftNemotronOCR
This model is designed for use with SwiftNemotronOCR, an all-Swift OCR pipeline.
Quick Start
# Clone the Swift package
git clone https://github.com/mweinbach/SwiftNemotronOCR.git
cd SwiftNemotronOCR
# Build
swift build -c release
# Compile Metal shaders (required for swift build — see repo README)
# ... (see SwiftNemotronOCR README for the metallib build step)
# Download this model
# Place v2_english/ and/or v2_multilingual/ under a model/mlx/ directory
# Run OCR on GPU
.build/release/apple-ocr-runner \
--runtime mlx \
--device GPU \
--model-dir /path/to/model/mlx \
--image /path/to/image.png \
--variant en \
--level paragraph
Output
JSON with detected text regions, confidence scores, and bounding quads:
{
"images": [{
"region_count": 24,
"latency_ms": 2841.6,
"regions": [
{"text": "Council", "confidence": 0.48, "quad": {"points": [...]}},
{"text": "RECONCILIATION", "confidence": 0.41, "quad": {"points": [...]}}
]
}]
}
File Structure
v2_english/
config.json # Model configuration + device presets
charset.txt # Character vocabulary (JSON array)
manifest.json # Component metadata
detector.safetensors # RegNet-X-8GF detector (~181 MB)
recognizer.safetensors # Transformer recognizer (~24 MB)
relational.safetensors # GNN relational model (~9 MB)
v2_multilingual/
config.json
charset.txt
manifest.json
detector.safetensors # Same detector (~181 MB)
recognizer.safetensors # Larger recognizer (~144 MB)
relational.safetensors # (~9 MB)
Related
- Swift Package: mweinbach/SwiftNemotronOCR — Full Swift implementation with CoreML + MLX backends
- CoreML Weights: mweinbach/nemotron-ocr-v2-coreml — ANE-optimized CoreML models
- Original Model: nvidia/nemotron-ocr-v2 — NVIDIA's original PyTorch release
Conversion
Weights were converted from the original PyTorch checkpoints:
- Conv weights transposed from OIHW → OHWI (MLX convention)
- Saved as safetensors format
- Config generated with device-specific detector resolution presets
License
This model inherits the NVIDIA Open Model License from the original Nemotron OCR v2 release.