You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Orient90 V2 β€” Mixed-Domain VGGT-1B + LoRA + 2-head

Discrete 3D orientation-correction model. Given a single RGB render of a 3D asset, predict the Euler rotation (corr_x, corr_y, corr_z) (all multiples of 90Β°) that rotates the asset back to its canonical upright orientation, plus a binary flag indicating whether the input is off-grid (i.e. not aligned to the 90Β° rotation group).

V2 replaces V1's DINOv2-large + full fine-tune with VGGT-1B + LoRA r=64, and trains jointly on a 1.53M-sample mix of artstation USDZ + characters + 3d66 indoor scenes β€” yielding +7pp on artstation and +7pp on 3d66 strict accuracy vs. V1.

  • Backbone: VGGT-1B aggregator, initialized from the Orient Anything V2 ckpt (Viglong/OriAnyV2_ckpt), wrapped with LoRA (rank 64, alpha 64) adapters on attn.{qkv,proj} + mlp.{fc1,fc2} β€” base weights are frozen.
  • Pool: LayerNorm(CLS + mean(patch_tokens)) over the last aggregator layer.
  • Heads:
    • 24-way classification over the cubic rotation group (SO(3) axis-aligned subgroup).
    • 1-dim off-grid probability (BCE).
  • Image size: 518Γ—518 (VGGT native, no ImageNet mean/std normalization).
  • Training data: 1.53M renders, Blender CYCLES @ 518, balanced across artstation USDZ + character + 3d66 + orbit-aware label smoothing (Ξ±=0.1, share=0.8).
  • Best checkpoint: epoch 4 of 5 (best.pt, score=0.8689).

Evaluation (val split, full labels_balanced_v2)

Dataset Strict cls_acc Orbit-sum acc Top-k inferred Off-head acc
character 92.89% 97.24% 93.39% 99.02%
artstation 74.59% 95.04% 79.29% 99.02%
3d66 86.51% 98.12% 91.15% 99.02%
  • Strict cls_acc: argmax over 24 classes equals the GT class (single-GT, authoritative for characters where facial asymmetry matters).
  • Orbit-sum acc: sum of softmax probability over the GT's full octahedral orbit (z180/y180/z90) β€” authoritative for arts/3d66 where symmetric objects have GT-ambiguous-but-equivalent partner classes.
  • Top-k inferred: argmax==GT OR (top1/top2 ratioβ‰₯0.5 AND top1/top2 ∈ orbit partners).

Ξ” vs. V1 (DINOv2-large full fine-tune)

Dataset V1 strict V2 strict Ξ”
character 92.33% 92.89% +0.56pp
artstation 74.16% 74.59% +0.43pp
3d66 ~80% 86.51% +7pp

V2 wins decisively on artstation orbit (95.04%) and 3d66 (98.12% orbit / +7pp strict) β€” the cross-domain bottleneck of V1.

Quick start

pip install -r requirements.txt
# Includes: vggt @ git+https://github.com/facebookresearch/vggt.git
from orient90_v2 import OrientPredictor

# Load model
p = OrientPredictor("best.pt", class_map_path="class_map.json", device="cuda")

# Direct image input β€” must be a Blender-CYCLES-style render (518Γ—518)
result = p.predict_image("examples/sample_render.png")
print(result)
# {
#   "class_id": 0,
#   "corr_x": 0, "corr_y": 0, "corr_z": 0,
#   "R": [[1,0,0],[0,1,0],[0,0,1]],
#   "confidence": 0.94,
#   "off_grid": False,
#   "off_grid_prob": 0.02
# }

# 3D model input β€” auto-renders via Blender if sibling .png is missing
result = p.predict_model("foo.glb", render_gpu="auto")

Class map β†’ rotation matrix

The 24 classes index the proper rotational subgroup of the cube (= octahedral group). Each class entry in class_map.json provides:

{
  "id": 7,
  "euler_xyz": [0, 0, 270],       // Apply Rz(-90Β°) to corrects the input
  "matrix": [[...], [...], [...]] // 3Γ—3 rotation matrix (canonical = matrix @ input)
}

To rotate the input mesh back to canonical:

import numpy as np, trimesh
R = np.array(result["R"])                      # 3Γ—3 in our internal Z-up frame
# glTF/USD store Y-up vertices β†’ conjugate before applying:
M_yz = trimesh.transformations.rotation_matrix(np.pi/2, [1,0,0])
R_yup = M_yz[:3,:3].T @ R @ M_yz[:3,:3]
T = np.eye(4); T[:3,:3] = R_yup
mesh = trimesh.load("foo.glb", force="scene")
mesh.apply_transform(T)
mesh.export("foo_canonical.glb")

Training recipe

VGGT-1B base (frozen) + LoRA r=64, Ξ±=64 on attn.qkv/proj + mlp.fc1/fc2
loss = CE(cls, GT) + Ξ»_off Γ— BCE(off, GT_offgrid)    Ξ»_off = 1.0
optimizer = AdamW, lr_backbone=5e-4, lr_head=1e-3, wd=1e-4, no_wd_on_lora
batch_size = 12 Γ— 7 GPUs (effective 84), warmup_ratio = 0.05
label_smoothing = 0.1 with orbit-aware smoothing (orbit_share=0.8)
epochs = 5, bf16, grad_clip = 1.0
val = max_val_samples=5000 random subset (training-time eval; full val below)

Compatibility / known issues

  • VGGT requires pip install git+https://github.com/facebookresearch/vggt.git (not on PyPI as of 2026-05).
  • Image preprocessing has no ImageNet mean/std normalization β€” VGGT's aggregator was trained on raw [0,1] tensors.
  • predict_model() needs a Blender env with bpy 4.4 installed. The bundled blender_scripts/blender_render_preview.py matches the training render pipeline.
  • The 2-head ckpt (this V2) does not produce sub-grid offsets. For sub-degree refinement, pair with the V4 mesh-only post-process (feat_iter_qLcont) β€” see the project repo's docs/reports/phase_b2_mesh_postprocess_v4.md.

Citations / related

  • Backbone: VGGT (Visual Geometry Grounded Transformer), Wang et al., 2024. https://github.com/facebookresearch/vggt
  • Backbone init: Orient Anything V2, NeurIPS'25 spotlight, Viglong/OriAnyV2_ckpt on Hugging Face.
  • V1 baseline (DINOv2 full fine-tune): noahdudu/orient90-v1.

License

Apache-2.0 (see LICENSE).

Downloads last month
52
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support