MoGe-2 ViT-L "normal" β Monocular Geometry + Surface Normals (ONNX)
Heliosoph mirror of Ruicheng/moge-2-vitl-normal-onnx β the ViT-Large variant of MoGe-2's joint geometry + surface-normal model. DINOv2 ViT-L backbone predicts a per-pixel 3D point map, camera intrinsics, and per-pixel surface normals in a single forward pass.
The "normal" suffix marks this as the joint variant β distinct from the base MoGe-2 ladder that predicts geometry only. Pairing geometry + normals from the same network removes the need for a separate normal-estimation pass (DSINE, omnidata) when feeding a Poisson surface reconstruction pipeline.
Top of the three-variant ladder β pick when reconstruction quality matters more than throughput or disk. ViT-B is the better default for most workloads.
ONNX file is unchanged from upstream β re-hosted for distribution stability (the upstream lives on the author's personal HF account). The upstream LICENSE travels along verbatim.
Credit: Ruicheng Wang and collaborators β MoGe-2 (Microsoft Research, 2025). The author's personal repos at Ruicheng/moge-2-vits-normal-onnx, Ruicheng/moge-2-vitb-normal-onnx, and Ruicheng/moge-2-vitl-normal-onnx are the authoritative upstream β this is a byte-for-byte mirror of the ViT-L variant.
What this repo contains
model.onnx # ~1.32 GB β DINOv2 ViT-L backbone, geometry + normal heads, fp32
LICENSE # MIT β copied verbatim from the upstream Ruicheng/moge-2-vitl-normal-onnx repo
The ONNX file is self-contained (no external .onnx_data sidecar) β upstream ships it as a single ~1.32 GB protobuf, comfortably under the 2 GB limit. The upstream repo includes model.onnx + LICENSE + .gitattributes; this mirror adds a README on top.
Variant ladder
| Variant | Backbone | Size | Use when⦠|
|---|---|---|---|
| ViT-S | DINOv2 ViT-Small (~22M backbone params) | ~141 MB | CPU / edge / fast-iteration workflows |
| ViT-B | DINOv2 ViT-Base (~86M) | ~419 MB | Recommended default β best quality-per-byte for GPU workloads |
| ViT-L (this) | DINOv2 ViT-Large (~300M) | ~1.32 GB | Peak quality, GPU-comfortable, large enough to push consumer VRAM |
All three share the same I/O signature β switch by swapping the file.
Input / output
| Spec | |
|---|---|
| Input | RGB image, NCHW float32, normalized per DINOv2 convention |
| Outputs | Per-pixel 3D point map (camera-frame), camera intrinsics, per-pixel surface normals |
| Dynamic axes | Batch + spatial β inspect with Netron for exact names and ranges |
The exact input/output tensor names + supported spatial-dim multiples aren't documented at the upstream repo (model.onnx + LICENSE + .gitattributes only). Inspect the graph with Netron before integrating, or cross-reference the microsoft/MoGe PyTorch reference for the preprocessing convention.
When to pick MoGe-2 normal vs alternatives
| Need | Pick |
|---|---|
| Geometry + normals from one forward pass | MoGe-2 normal (this family) |
| Relative depth only, broadest hardware support | Depth Anything V2/V3 |
| Metric depth in meters, outdoor scenes | Metric3D V2 |
| Surface normals only, smallest model | DSINE |
| Per-pixel point map only (no normals) | MoGe v1 ViT-L |
MoGe-2 normal is the right pick when you're feeding a Poisson surface reconstruction (which wants both positions AND normals at every point), or when downstream rendering needs per-pixel shading normals "for free" alongside depth.
License
MIT β copied verbatim from the upstream Ruicheng/moge-2-vitl-normal-onnx/LICENSE. This is the authoritative variant of the three β the upstream ViT-S and ViT-B sibling repos don't ship LICENSE files, but the microsoft/MoGe code repo is also MIT so the family-wide license terms are consistent.
Model tree for Heliosoph/moge-2-vitl-normal-onnx
Base model
Ruicheng/moge-2-vitl-normal-onnx