MoGe-2 ViT-S "normal" β€” Monocular Geometry + Surface Normals (ONNX)

Heliosoph mirror of Ruicheng/moge-2-vits-normal-onnx β€” the ViT-Small variant of MoGe-2's joint geometry + surface-normal model. DINOv2 ViT-S backbone predicts a per-pixel 3D point map, camera intrinsics, and per-pixel surface normals in a single forward pass.

The "normal" suffix marks this as the joint variant β€” distinct from the base MoGe-2 ladder that predicts geometry only. Pairing geometry + normals from the same network removes the need for a separate normal-estimation pass (DSINE, omnidata) when feeding a Poisson surface reconstruction pipeline.

ONNX file is unchanged from upstream β€” re-hosted for distribution stability (the upstream lives on the author's personal HF account) and to ship a proper LICENSE + README alongside the bytes.

Credit: Ruicheng Wang and collaborators β€” MoGe-2 (Microsoft Research, 2025). The author's personal repos at Ruicheng/moge-2-vits-normal-onnx, Ruicheng/moge-2-vitb-normal-onnx, and Ruicheng/moge-2-vitl-normal-onnx are the authoritative upstream β€” this is a byte-for-byte mirror of the ViT-S variant.

What this repo contains

model.onnx        # ~141 MB β€” DINOv2 ViT-S backbone, geometry + normal heads, fp32
LICENSE           # MIT

The ONNX file is self-contained (no external .onnx_data sidecar). The upstream repo ships only model.onnx + .gitattributes; this mirror adds the LICENSE + README.

Variant ladder

Variant Backbone Size Use when…
ViT-S (this) DINOv2 ViT-Small (~22M backbone params) ~141 MB CPU / edge / fast-iteration workflows; throughput matters more than peak quality
ViT-B DINOv2 ViT-Base (~86M) ~419 MB Recommended default β€” best quality-per-byte for GPU workloads
ViT-L DINOv2 ViT-Large (~300M) ~1.32 GB Peak quality, GPU-comfortable, large enough to push consumer VRAM

All three share the same I/O signature β€” switch by swapping the file.

Input / output

Spec
Input RGB image, NCHW float32, normalized per DINOv2 convention
Outputs Per-pixel 3D point map (camera-frame), camera intrinsics, per-pixel surface normals
Dynamic axes Batch + spatial β€” inspect with Netron for exact names and ranges

The exact input/output tensor names + supported spatial-dim multiples aren't documented at the upstream repo (model.onnx + .gitattributes only). Inspect the graph with Netron before integrating, or cross-reference the microsoft/MoGe PyTorch reference for the preprocessing convention.

When to pick MoGe-2 normal vs alternatives

Need Pick
Geometry + normals from one forward pass MoGe-2 normal (this family)
Relative depth only, broadest hardware support Depth Anything V2/V3
Metric depth in meters, outdoor scenes Metric3D V2
Surface normals only, smallest model DSINE
Per-pixel point map only (no normals) MoGe v1 ViT-L

MoGe-2 normal is the right pick when you're feeding a Poisson surface reconstruction (which wants both positions AND normals at every point), or when downstream rendering needs per-pixel shading normals "for free" alongside depth.

License

MIT β€” assumed from the sibling Ruicheng/moge-2-vitl-normal-onnx which ships an explicit LICENSE file, plus the upstream microsoft/MoGe code repo being MIT. The upstream ViT-S repo doesn't ship a LICENSE itself; this mirror adds a canonical MIT LICENSE with copyright attributed to Microsoft Research. If the upstream author confirms a different license later, this mirror will follow.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Heliosoph/moge-2-vits-normal-onnx

Quantized
(1)
this model