TripoSG-scribble - Fast 3D Shape Prototyping with Scribble and Prompt

TripoSG-scribble converts a scribble image and a text prompt to a 3D shape. TripoSG-scribble is a variant of TripoSG. TripoSG is a state-of-the-art image-to-3D generation foundation model that leverages large-scale rectified flow transformers to produce high-fidelity 3D shapes from single images.

Model Description

Model Architecture

TripoSG utilizes a novel architecture combining:

Rectified Flow (RF) based Transformer for stable, linear trajectory modeling
Advanced VAE with SDF-based representation and hybrid geometric supervision
Cross-attention mechanism for image feature condition
1.5B parameters operating on 2048 latent tokens

For inference efficiency, TripoSG-scribble is different from TripoSG in:

TripoSG-scribble is a CFG-distilled model and should be used with CFG=0
TripoSG-scribble is trained with 512 latent tokens

Intended Uses

This model is designed for:

Converting scribble image and text prompt to high-quality 3D meshes
Creative and design applications
Gaming and VFX asset creation
Prototyping and visualization

Requirements

CUDA-capable GPU (>8GB VRAM)

Usage

For detailed usage instructions, please visit our GitHub repository.

About

TripoSG-scribble is developed by Tripo, VAST AI Research, pushing the boundaries of 3D Generative AI. For more information:

Downloads last month: 35

Inference Providers NEW

Image-to-3D

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Spaces using VAST-AI/TripoSG-scribble 4

Paper for VAST-AI/TripoSG-scribble

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

Paper • 2502.06608 • Published Feb 10, 2025 • 39