Orient90 β Artstation (Full-FT)
Discrete 3D orientation-correction model. Given a single RGB render of a 3D
asset, predict the Euler rotation (corr_x, corr_y, corr_z) (all multiples of
90Β°) that rotates the asset back to its canonical upright orientation, plus a
binary flag indicating whether the input is off-grid (i.e. not aligned to the
90Β° rotation group).
- Backbone: DINOv2-large, initialized from the Orient Anything weights, fully fine-tuned.
- Heads:
- 24-way classification over the cubic rotation group (
SO(3)axis-aligned subgroup). - 1-dim off-grid probability (BCE).
- 24-way classification over the cubic rotation group (
- Training data: synthetic renders from the Artstation USDZ corpus (1 on-grid + off-grid samples per asset, Blender Cycles @ 512).
Evaluation (val split)
| dataset | cls_acc_on_grid | cls_acc_all | off_acc |
|---|---|---|---|
| artstation | 74.16% | 70.36% | 98.07% |
Compared to fullft_character, this checkpoint is the right pick for
Artstation-domain inputs; it also generalizes to the character domain with
minor degradation.
Repository layout
orient90-v1/
βββ README.md model card + usage
βββ LICENSE Apache-2.0
βββ config.json model metadata
βββ class_map.json 24 on-grid classes (euler_xyz + matrix)
βββ best.pt checkpoint (Git LFS)
βββ requirements.txt torch / transformers / pillow / numpy
βββ requirements-render.txt optional: bpy for 3D-model input
βββ orient90/ Python package (import orient90)
β βββ __init__.py
β βββ model.py Orient90Net definition
β βββ predictor.py OrientPredictor high-level API
β βββ render.py Blender subprocess wrapper
β βββ gpu_utils.py nvidia-smi based GPU selection
βββ blender_scripts/
β βββ blender_render_preview.py Cycles render script (requires bpy)
βββ scripts/
β βββ predict.py CLI entry-point
βββ examples/
Installation
# inference env
pip install -r requirements.txt
# optional, only if you want 3D-model input auto-rendered
python -m venv .render-env
.render-env/bin/pip install -r requirements-render.txt
# then pass --blender-python .render-env/bin/python to scripts/predict.py
Getting the weights
The 1.2 GB checkpoint best.pt is hosted on Hugging Face.
- If you cloned from Hugging Face:
best.ptis already present (LFS). - If you cloned from GitHub: the mirror ships code only; pull the weight
from HF with
pip install huggingface_hub python scripts/download_weights.py # default repo is noahdudu/orient90-v1
The DINOv2-large backbone (β1.2 GB) is pulled from Hugging Face on first run.
To pre-fetch or use an offline cache, set HF_HOME=<path> or pass
dino_cache_dir=<path> to OrientPredictor (expects a directory containing
config.json plus the processor/model files).
Quickstart
Python API
from orient90 import OrientPredictor
predictor = OrientPredictor(
checkpoint_path="best.pt",
class_map_path="class_map.json",
device="auto",
)
# (a) image input β a render of the asset
print(predictor.predict_image("example_render.png"))
# (b) 3D model input β auto-renders the sibling PNG with Blender if missing,
# then runs image inference on that PNG.
print(predictor.predict_model("example.glb", render_gpu="auto"))
Both calls return:
{
"class_id": 5,
"corr_x": 0,
"corr_y": 90,
"corr_z": 90,
"confidence": 0.9821,
"off_grid": false,
"off_grid_prob": 0.0173
}
predict_model additionally returns model_path, render_path, and
rendered_now (true when Blender was just invoked).
CLI
# image input
python scripts/predict.py --input render.png
# 3D model input β checks for sibling render.png; if absent, renders via Blender
python scripts/predict.py --input model.glb
# specific render GPU, force re-render, save result JSON
python scripts/predict.py \
--input model.usdz \
--render-gpu 0 \
--force-render \
--output-json out.json
Run python scripts/predict.py --help for all flags.
Example
A ready-made sample lives in examples/:
python scripts/predict.py --input examples/sample_render.png
Expected output (see examples/sample_render_expected.json):
{
"class_id": 13,
"corr_x": 0, "corr_y": 270, "corr_z": 90,
"confidence": 0.9105,
"off_grid": false,
"off_grid_prob": 1e-6
}
This render has ground-truth (0, 270, 90) applied during synthesis; the
prediction matches exactly.
Inputs and outputs
Accepted image extensions: .png, .jpg, .jpeg, .webp, .bmp, .tif, .tiff.
Accepted 3D-model extensions: .glb, .gltf, .usdz, .usd, .usdc, .usda, .obj, .stl.
Render behavior for 3D inputs: for <path>/<stem>.<ext>, the predictor
looks for <path>/<stem>.png. If missing (or --force-render), Blender is
invoked against that path via blender_scripts/blender_render_preview.py
(Cycles, GPU if available, 512Γ512). The generated PNG is kept alongside the
model file so subsequent calls skip rendering.
Class map: class_map.json enumerates the 24 axis-aligned Euler triples
(in degrees, XYZ intrinsic order) deduplicated from the 4Γ4Γ4 = 64
combinations β identical to the training map.
Notes
- The checkpoint stores
{state_dict, model_size, num_classes, class_map}.class_mapinside the checkpoint is a path string pointing to the original training tree; the predictor ignores it and usesclass_map_pathfrom its constructor (defaults to theclass_map.jsonnext tobest.pt). - First call downloads DINOv2-large from HuggingFace (β1.2 GB). Cache with
HF_HOMEor passdino_cache_dir=. - The Blender renderer normalizes each asset to a unit sphere and composes a single Cycles shot β same pipeline used for training data generation.
Citation / attribution
- DINOv2 backbone: Oquab et al., "DINOv2: Learning Robust Visual Features without Supervision", Meta AI, 2023.
- Orient Anything initialization weights.
- Training data: Artstation USDZ corpus (synthetic renders produced in-house).
- Downloads last month
- 2
Model tree for noahdudu/orient90-v1
Base model
facebook/dinov2-large