TripoSR: Fast 3D Object Reconstruction from a Single Image
Paper
•
2403.02151
•
Published
•
16
This is the ONNX-converted encoder from TripoSR, a fast feedforward 3D reconstruction model from Stability AI and Tripo AI.
| Property | Value |
|---|---|
| Model Size | ~1.6 GB |
| Parameters | 419M |
| Input | RGB Image (1, 3, 512, 512) |
| Output | Scene Codes / Triplane (1, 3, 40, 64, 64) |
| ONNX Opset | 18 |
| Format | ONNX with external data |
import onnxruntime as ort
import numpy as np
from PIL import Image
# Load the model
session = ort.InferenceSession(
"triposr_encoder.onnx",
providers=['CPUExecutionProvider'] # or 'CoreMLExecutionProvider' for iOS
)
# Preprocess image
image = Image.open("your_image.png").convert("RGB").resize((512, 512))
input_array = np.array(image).astype(np.float32) / 255.0
input_array = input_array.transpose(2, 0, 1)[np.newaxis, ...]
# Run inference
scene_codes = session.run(None, {"input_image": input_array})[0]
print(f"Scene codes shape: {scene_codes.shape}")
Add ONNX Runtime to your project via SPM:
https://github.com/nicklockwood/ORTSwift
import OnnxRuntimeBindings
// Load model
let session = try ORTSession(env: env, modelPath: modelPath, sessionOptions: nil)
// Run inference
let inputTensor = try ORTValue(tensorData: imageData, elementType: .float, shape: [1, 3, 512, 512])
let outputs = try session.run(
withInputs: ["input_image": inputTensor],
outputNames: ["scene_codes"]
)
This model is the encoder portion of TripoSR:
The output "scene codes" are triplane features that can be used with a decoder and marching cubes algorithm to extract 3D meshes.
triposr_encoder.onnx - ONNX model graph (2.6 MB)triposr_encoder.onnx.data - Model weights (1.6 GB)Original TripoSR paper:
@article{TripoSR2024,
title={TripoSR: Fast 3D Object Reconstruction from a Single Image},
author={Tochilkin, Dmitry and Pankratz, David and Liu, Zexiang and Huang, Zixuan and Letts, Adam and Li, Yangguang and Liang, Ding and Laforte, Christian and Jampani, Varun and Cao, Yan-Pei},
journal={arXiv preprint arXiv:2403.02151},
year={2024}
}
MIT License (same as original TripoSR)