You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Orient90 β€” Artstation (Full-FT)

Discrete 3D orientation-correction model. Given a single RGB render of a 3D asset, predict the Euler rotation (corr_x, corr_y, corr_z) (all multiples of 90Β°) that rotates the asset back to its canonical upright orientation, plus a binary flag indicating whether the input is off-grid (i.e. not aligned to the 90Β° rotation group).

  • Backbone: DINOv2-large, initialized from the Orient Anything weights, fully fine-tuned.
  • Heads:
    • 24-way classification over the cubic rotation group (SO(3) axis-aligned subgroup).
    • 1-dim off-grid probability (BCE).
  • Training data: synthetic renders from the Artstation USDZ corpus (1 on-grid + off-grid samples per asset, Blender Cycles @ 512).

Evaluation (val split)

dataset cls_acc_on_grid cls_acc_all off_acc
artstation 74.16% 70.36% 98.07%

Compared to fullft_character, this checkpoint is the right pick for Artstation-domain inputs; it also generalizes to the character domain with minor degradation.

Repository layout

orient90-v1/
β”œβ”€β”€ README.md                           model card + usage
β”œβ”€β”€ LICENSE                             Apache-2.0
β”œβ”€β”€ config.json                         model metadata
β”œβ”€β”€ class_map.json                      24 on-grid classes (euler_xyz + matrix)
β”œβ”€β”€ best.pt                             checkpoint (Git LFS)
β”œβ”€β”€ requirements.txt                    torch / transformers / pillow / numpy
β”œβ”€β”€ requirements-render.txt             optional: bpy for 3D-model input
β”œβ”€β”€ orient90/                           Python package (import orient90)
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ model.py                        Orient90Net definition
β”‚   β”œβ”€β”€ predictor.py                    OrientPredictor high-level API
β”‚   β”œβ”€β”€ render.py                       Blender subprocess wrapper
β”‚   └── gpu_utils.py                    nvidia-smi based GPU selection
β”œβ”€β”€ blender_scripts/
β”‚   └── blender_render_preview.py       Cycles render script (requires bpy)
β”œβ”€β”€ scripts/
β”‚   └── predict.py                      CLI entry-point
└── examples/

Installation

# inference env
pip install -r requirements.txt

# optional, only if you want 3D-model input auto-rendered
python -m venv .render-env
.render-env/bin/pip install -r requirements-render.txt
# then pass --blender-python .render-env/bin/python to scripts/predict.py

Getting the weights

The 1.2 GB checkpoint best.pt is hosted on Hugging Face.

  • If you cloned from Hugging Face: best.pt is already present (LFS).
  • If you cloned from GitHub: the mirror ships code only; pull the weight from HF with
    pip install huggingface_hub
    python scripts/download_weights.py       # default repo is noahdudu/orient90-v1
    

The DINOv2-large backbone (β‰ˆ1.2 GB) is pulled from Hugging Face on first run. To pre-fetch or use an offline cache, set HF_HOME=<path> or pass dino_cache_dir=<path> to OrientPredictor (expects a directory containing config.json plus the processor/model files).

Quickstart

Python API

from orient90 import OrientPredictor

predictor = OrientPredictor(
    checkpoint_path="best.pt",
    class_map_path="class_map.json",
    device="auto",
)

# (a) image input β€” a render of the asset
print(predictor.predict_image("example_render.png"))

# (b) 3D model input β€” auto-renders the sibling PNG with Blender if missing,
# then runs image inference on that PNG.
print(predictor.predict_model("example.glb", render_gpu="auto"))

Both calls return:

{
  "class_id": 5,
  "corr_x": 0,
  "corr_y": 90,
  "corr_z": 90,
  "confidence": 0.9821,
  "off_grid": false,
  "off_grid_prob": 0.0173
}

predict_model additionally returns model_path, render_path, and rendered_now (true when Blender was just invoked).

CLI

# image input
python scripts/predict.py --input render.png

# 3D model input β€” checks for sibling render.png; if absent, renders via Blender
python scripts/predict.py --input model.glb

# specific render GPU, force re-render, save result JSON
python scripts/predict.py \
  --input model.usdz \
  --render-gpu 0 \
  --force-render \
  --output-json out.json

Run python scripts/predict.py --help for all flags.

Example

A ready-made sample lives in examples/:

python scripts/predict.py --input examples/sample_render.png

Expected output (see examples/sample_render_expected.json):

{
  "class_id": 13,
  "corr_x": 0, "corr_y": 270, "corr_z": 90,
  "confidence": 0.9105,
  "off_grid": false,
  "off_grid_prob": 1e-6
}

This render has ground-truth (0, 270, 90) applied during synthesis; the prediction matches exactly.

Inputs and outputs

Accepted image extensions: .png, .jpg, .jpeg, .webp, .bmp, .tif, .tiff.

Accepted 3D-model extensions: .glb, .gltf, .usdz, .usd, .usdc, .usda, .obj, .stl.

Render behavior for 3D inputs: for <path>/<stem>.<ext>, the predictor looks for <path>/<stem>.png. If missing (or --force-render), Blender is invoked against that path via blender_scripts/blender_render_preview.py (Cycles, GPU if available, 512Γ—512). The generated PNG is kept alongside the model file so subsequent calls skip rendering.

Class map: class_map.json enumerates the 24 axis-aligned Euler triples (in degrees, XYZ intrinsic order) deduplicated from the 4Γ—4Γ—4 = 64 combinations β€” identical to the training map.

Notes

  • The checkpoint stores {state_dict, model_size, num_classes, class_map}. class_map inside the checkpoint is a path string pointing to the original training tree; the predictor ignores it and uses class_map_path from its constructor (defaults to the class_map.json next to best.pt).
  • First call downloads DINOv2-large from HuggingFace (β‰ˆ1.2 GB). Cache with HF_HOME or pass dino_cache_dir=.
  • The Blender renderer normalizes each asset to a unit sphere and composes a single Cycles shot β€” same pipeline used for training data generation.

Citation / attribution

  • DINOv2 backbone: Oquab et al., "DINOv2: Learning Robust Visual Features without Supervision", Meta AI, 2023.
  • Orient Anything initialization weights.
  • Training data: Artstation USDZ corpus (synthetic renders produced in-house).
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for noahdudu/orient90-v1

Finetuned
(33)
this model