MoGe-2 ViT-L "normal" β€” Monocular Geometry + Surface Normals (ONNX)

Heliosoph mirror of Ruicheng/moge-2-vitl-normal-onnx β€” the ViT-Large variant of MoGe-2's joint geometry + surface-normal model. DINOv2 ViT-L backbone predicts a per-pixel 3D point map, camera intrinsics, and per-pixel surface normals in a single forward pass.

The "normal" suffix marks this as the joint variant β€” distinct from the base MoGe-2 ladder that predicts geometry only. Pairing geometry + normals from the same network removes the need for a separate normal-estimation pass (DSINE, omnidata) when feeding a Poisson surface reconstruction pipeline.

Top of the three-variant ladder β€” pick when reconstruction quality matters more than throughput or disk. ViT-B is the better default for most workloads.

ONNX file is unchanged from upstream β€” re-hosted for distribution stability (the upstream lives on the author's personal HF account). The upstream LICENSE travels along verbatim.

Credit: Ruicheng Wang and collaborators β€” MoGe-2 (Microsoft Research, 2025). The author's personal repos at Ruicheng/moge-2-vits-normal-onnx, Ruicheng/moge-2-vitb-normal-onnx, and Ruicheng/moge-2-vitl-normal-onnx are the authoritative upstream β€” this is a byte-for-byte mirror of the ViT-L variant.

What this repo contains

model.onnx        # ~1.32 GB β€” DINOv2 ViT-L backbone, geometry + normal heads, fp32
LICENSE           # MIT β€” copied verbatim from the upstream Ruicheng/moge-2-vitl-normal-onnx repo

The ONNX file is self-contained (no external .onnx_data sidecar) β€” upstream ships it as a single ~1.32 GB protobuf, comfortably under the 2 GB limit. The upstream repo includes model.onnx + LICENSE + .gitattributes; this mirror adds a README on top.

Variant ladder

Variant Backbone Size Use when…
ViT-S DINOv2 ViT-Small (~22M backbone params) ~141 MB CPU / edge / fast-iteration workflows
ViT-B DINOv2 ViT-Base (~86M) ~419 MB Recommended default β€” best quality-per-byte for GPU workloads
ViT-L (this) DINOv2 ViT-Large (~300M) ~1.32 GB Peak quality, GPU-comfortable, large enough to push consumer VRAM

All three share the same I/O signature β€” switch by swapping the file.

Input / output

Spec
Input RGB image, NCHW float32, normalized per DINOv2 convention
Outputs Per-pixel 3D point map (camera-frame), camera intrinsics, per-pixel surface normals
Dynamic axes Batch + spatial β€” inspect with Netron for exact names and ranges

The exact input/output tensor names + supported spatial-dim multiples aren't documented at the upstream repo (model.onnx + LICENSE + .gitattributes only). Inspect the graph with Netron before integrating, or cross-reference the microsoft/MoGe PyTorch reference for the preprocessing convention.

When to pick MoGe-2 normal vs alternatives

Need Pick
Geometry + normals from one forward pass MoGe-2 normal (this family)
Relative depth only, broadest hardware support Depth Anything V2/V3
Metric depth in meters, outdoor scenes Metric3D V2
Surface normals only, smallest model DSINE
Per-pixel point map only (no normals) MoGe v1 ViT-L

MoGe-2 normal is the right pick when you're feeding a Poisson surface reconstruction (which wants both positions AND normals at every point), or when downstream rendering needs per-pixel shading normals "for free" alongside depth.

License

MIT β€” copied verbatim from the upstream Ruicheng/moge-2-vitl-normal-onnx/LICENSE. This is the authoritative variant of the three β€” the upstream ViT-S and ViT-B sibling repos don't ship LICENSE files, but the microsoft/MoGe code repo is also MIT so the family-wide license terms are consistent.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Heliosoph/moge-2-vitl-normal-onnx

Quantized
(1)
this model