Nemotron OCR v2 β MLX (Apple Silicon)
MLX-format weights for NVIDIA Nemotron OCR v2, converted from PyTorch for native Apple Silicon inference via mlx-swift.
Model Details
Three-stage OCR pipeline optimized for Apple Silicon CPU and GPU:
| Component | Architecture | Parameters |
|---|---|---|
| Detector | RegNet-X-8GF + ASPP + FPN | ~43M |
| Recognizer | CNN encoder + Transformer decoder | ~6M |
| Relational | Graph neural network + Transformer | ~2M |
Variants
| Variant | Charset | Vocab Size | Recognizer Seq Length |
|---|---|---|---|
v2_english |
855 characters | 858 tokens | 32 |
v2_multilingual |
~42K characters | ~42K tokens | 512 |
Device-Tuned Presets
| Device | Detector Resolution | dtype | Notes |
|---|---|---|---|
| GPU | 512Γ512 | bfloat16 | Best throughput on Apple Silicon GPU via Metal |
| CPU | 256Γ256 | float32 | Reduced resolution avoids expensive full-res CPU inference |
Usage with SwiftNemotronOCR
This model is designed for use with SwiftNemotronOCR, an all-Swift OCR pipeline.
Quick Start
# Clone the Swift package
git clone https://github.com/mweinbach/SwiftNemotronOCR.git
cd SwiftNemotronOCR
# Build
swift build -c release
# Compile Metal shaders (required for swift build β see repo README)
# ... (see SwiftNemotronOCR README for the metallib build step)
# Download this model
# Place v2_english/ and/or v2_multilingual/ under a model/mlx/ directory
# Run OCR on GPU
.build/release/apple-ocr-runner \
--runtime mlx \
--device GPU \
--model-dir /path/to/model/mlx \
--image /path/to/image.png \
--variant en \
--level paragraph
Output
JSON with detected text regions, confidence scores, and bounding quads:
{
"images": [{
"region_count": 24,
"latency_ms": 2841.6,
"regions": [
{"text": "Council", "confidence": 0.48, "quad": {"points": [...]}},
{"text": "RECONCILIATION", "confidence": 0.41, "quad": {"points": [...]}}
]
}]
}
File Structure
v2_english/
config.json # Model configuration + device presets
charset.txt # Character vocabulary (JSON array)
manifest.json # Component metadata
detector.safetensors # RegNet-X-8GF detector (~181 MB)
recognizer.safetensors # Transformer recognizer (~24 MB)
relational.safetensors # GNN relational model (~9 MB)
v2_multilingual/
config.json
charset.txt
manifest.json
detector.safetensors # Same detector (~181 MB)
recognizer.safetensors # Larger recognizer (~144 MB)
relational.safetensors # (~9 MB)
Related
- Swift Package: mweinbach/SwiftNemotronOCR β Full Swift implementation with CoreML + MLX backends
- CoreML Weights: mweinbach/nemotron-ocr-v2-coreml β ANE-optimized CoreML models
- Original Model: nvidia/nemotron-ocr-v2 β NVIDIA's original PyTorch release
Conversion
Weights were converted from the original PyTorch checkpoints:
- Conv weights transposed from OIHW β OHWI (MLX convention)
- Saved as safetensors format
- Config generated with device-specific detector resolution presets
License
This model inherits the NVIDIA Open Model License from the original Nemotron OCR v2 release.
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for mweinbach/nemotron-ocr-v2-mlx
Base model
nvidia/nemotron-ocr-v2