Nemotron OCR v2 β€” MLX (Apple Silicon)

MLX-format weights for NVIDIA Nemotron OCR v2, converted from PyTorch for native Apple Silicon inference via mlx-swift.

Model Details

Three-stage OCR pipeline optimized for Apple Silicon CPU and GPU:

Component Architecture Parameters
Detector RegNet-X-8GF + ASPP + FPN ~43M
Recognizer CNN encoder + Transformer decoder ~6M
Relational Graph neural network + Transformer ~2M

Variants

Variant Charset Vocab Size Recognizer Seq Length
v2_english 855 characters 858 tokens 32
v2_multilingual ~42K characters ~42K tokens 512

Device-Tuned Presets

Device Detector Resolution dtype Notes
GPU 512Γ—512 bfloat16 Best throughput on Apple Silicon GPU via Metal
CPU 256Γ—256 float32 Reduced resolution avoids expensive full-res CPU inference

Usage with SwiftNemotronOCR

This model is designed for use with SwiftNemotronOCR, an all-Swift OCR pipeline.

Quick Start

# Clone the Swift package
git clone https://github.com/mweinbach/SwiftNemotronOCR.git
cd SwiftNemotronOCR

# Build
swift build -c release

# Compile Metal shaders (required for swift build β€” see repo README)
# ... (see SwiftNemotronOCR README for the metallib build step)

# Download this model
# Place v2_english/ and/or v2_multilingual/ under a model/mlx/ directory

# Run OCR on GPU
.build/release/apple-ocr-runner \
  --runtime mlx \
  --device GPU \
  --model-dir /path/to/model/mlx \
  --image /path/to/image.png \
  --variant en \
  --level paragraph

Output

JSON with detected text regions, confidence scores, and bounding quads:

{
  "images": [{
    "region_count": 24,
    "latency_ms": 2841.6,
    "regions": [
      {"text": "Council", "confidence": 0.48, "quad": {"points": [...]}},
      {"text": "RECONCILIATION", "confidence": 0.41, "quad": {"points": [...]}}
    ]
  }]
}

File Structure

v2_english/
  config.json              # Model configuration + device presets
  charset.txt              # Character vocabulary (JSON array)
  manifest.json            # Component metadata
  detector.safetensors     # RegNet-X-8GF detector (~181 MB)
  recognizer.safetensors   # Transformer recognizer (~24 MB)
  relational.safetensors   # GNN relational model (~9 MB)

v2_multilingual/
  config.json
  charset.txt
  manifest.json
  detector.safetensors     # Same detector (~181 MB)
  recognizer.safetensors   # Larger recognizer (~144 MB)
  relational.safetensors   # (~9 MB)

Related

Conversion

Weights were converted from the original PyTorch checkpoints:

  • Conv weights transposed from OIHW β†’ OHWI (MLX convention)
  • Saved as safetensors format
  • Config generated with device-specific detector resolution presets

License

This model inherits the NVIDIA Open Model License from the original Nemotron OCR v2 release.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mweinbach/nemotron-ocr-v2-mlx

Finetuned
(1)
this model