TripoSR iOS (ONNX)

This is the ONNX-converted encoder from TripoSR, a fast feedforward 3D reconstruction model from Stability AI and Tripo AI.

Model Details

Property Value
Model Size ~1.6 GB
Parameters 419M
Input RGB Image (1, 3, 512, 512)
Output Scene Codes / Triplane (1, 3, 40, 64, 64)
ONNX Opset 18
Format ONNX with external data

Usage

Python (ONNX Runtime)

import onnxruntime as ort
import numpy as np
from PIL import Image

# Load the model
session = ort.InferenceSession(
    "triposr_encoder.onnx",
    providers=['CPUExecutionProvider']  # or 'CoreMLExecutionProvider' for iOS
)

# Preprocess image
image = Image.open("your_image.png").convert("RGB").resize((512, 512))
input_array = np.array(image).astype(np.float32) / 255.0
input_array = input_array.transpose(2, 0, 1)[np.newaxis, ...]

# Run inference
scene_codes = session.run(None, {"input_image": input_array})[0]
print(f"Scene codes shape: {scene_codes.shape}")

iOS (Swift with ONNX Runtime)

Add ONNX Runtime to your project via SPM:

https://github.com/nicklockwood/ORTSwift
import OnnxRuntimeBindings

// Load model
let session = try ORTSession(env: env, modelPath: modelPath, sessionOptions: nil)

// Run inference
let inputTensor = try ORTValue(tensorData: imageData, elementType: .float, shape: [1, 3, 512, 512])
let outputs = try session.run(
    withInputs: ["input_image": inputTensor],
    outputNames: ["scene_codes"]
)

Architecture

This model is the encoder portion of TripoSR:

  1. Image Tokenizer - DINO ViT-B/16 pretrained vision transformer
  2. Backbone - Transformer decoder with cross-attention
  3. Post Processor - Converts tokens to triplane representation

The output "scene codes" are triplane features that can be used with a decoder and marching cubes algorithm to extract 3D meshes.

Files

  • triposr_encoder.onnx - ONNX model graph (2.6 MB)
  • triposr_encoder.onnx.data - Model weights (1.6 GB)

Citation

Original TripoSR paper:

@article{TripoSR2024,
  title={TripoSR: Fast 3D Object Reconstruction from a Single Image},
  author={Tochilkin, Dmitry and Pankratz, David and Liu, Zexiang and Huang, Zixuan and Letts, Adam and Li, Yangguang and Liang, Ding and Laforte, Christian and Jampani, Varun and Cao, Yan-Pei},
  journal={arXiv preprint arXiv:2403.02151},
  year={2024}
}

License

MIT License (same as original TripoSR)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for jc-builds/triposr-ios