MoGe-2 ViT-L "normal" — Monocular Geometry + Surface Normals (ONNX)

Heliosoph mirror of Ruicheng/moge-2-vitl-normal-onnx — the ViT-Large variant of MoGe-2's joint geometry + surface-normal model. DINOv2 ViT-L backbone predicts a per-pixel 3D point map, camera intrinsics, and per-pixel surface normals in a single forward pass.

The "normal" suffix marks this as the joint variant — distinct from the base MoGe-2 ladder that predicts geometry only. Pairing geometry + normals from the same network removes the need for a separate normal-estimation pass (DSINE, omnidata) when feeding a Poisson surface reconstruction pipeline.

Top of the three-variant ladder — pick when reconstruction quality matters more than throughput or disk. ViT-B is the better default for most workloads.

ONNX file is unchanged from upstream — re-hosted for distribution stability (the upstream lives on the author's personal HF account). The upstream LICENSE travels along verbatim.

Credit: Ruicheng Wang and collaborators — MoGe-2 (Microsoft Research, 2025). The author's personal repos at Ruicheng/moge-2-vits-normal-onnx, Ruicheng/moge-2-vitb-normal-onnx, and Ruicheng/moge-2-vitl-normal-onnx are the authoritative upstream — this is a byte-for-byte mirror of the ViT-L variant.

What this repo contains

model.onnx        # ~1.32 GB — DINOv2 ViT-L backbone, geometry + normal heads, fp32
LICENSE           # MIT — copied verbatim from the upstream Ruicheng/moge-2-vitl-normal-onnx repo

The ONNX file is self-contained (no external .onnx_data sidecar) — upstream ships it as a single ~1.32 GB protobuf, comfortably under the 2 GB limit. The upstream repo includes model.onnx + LICENSE + .gitattributes; this mirror adds a README on top.

Variant ladder

Variant	Backbone	Size	Use when…
ViT-S	DINOv2 ViT-Small (~22M backbone params)	~141 MB	CPU / edge / fast-iteration workflows
ViT-B	DINOv2 ViT-Base (~86M)	~419 MB	Recommended default — best quality-per-byte for GPU workloads
ViT-L (this)	DINOv2 ViT-Large (~300M)	~1.32 GB	Peak quality, GPU-comfortable, large enough to push consumer VRAM

All three share the same I/O signature — switch by swapping the file.

Input / output

	Spec
Input	RGB image, NCHW float32, normalized per DINOv2 convention
Outputs	Per-pixel 3D point map (camera-frame), camera intrinsics, per-pixel surface normals
Dynamic axes	Batch + spatial — inspect with Netron for exact names and ranges

The exact input/output tensor names + supported spatial-dim multiples aren't documented at the upstream repo (model.onnx + LICENSE + .gitattributes only). Inspect the graph with Netron before integrating, or cross-reference the microsoft/MoGe PyTorch reference for the preprocessing convention.

When to pick MoGe-2 normal vs alternatives

Need	Pick
Geometry + normals from one forward pass	MoGe-2 normal (this family)
Relative depth only, broadest hardware support	Depth Anything V2/V3
Metric depth in meters, outdoor scenes	Metric3D V2
Surface normals only, smallest model	DSINE
Per-pixel point map only (no normals)	MoGe v1 ViT-L

MoGe-2 normal is the right pick when you're feeding a Poisson surface reconstruction (which wants both positions AND normals at every point), or when downstream rendering needs per-pixel shading normals "for free" alongside depth.

License

MIT — copied verbatim from the upstream Ruicheng/moge-2-vitl-normal-onnx/LICENSE. This is the authoritative variant of the three — the upstream ViT-S and ViT-B sibling repos don't ship LICENSE files, but the microsoft/MoGe code repo is also MIT so the family-wide license terms are consistent.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Heliosoph/moge-2-vitl-normal-onnx

Base model

Ruicheng/moge-2-vitl-normal-onnx

Quantized

(1)

this model