🚀 First INT4 Quantized Cube3D — Run on Half the VRAM

Presenting the first INT4 quantized version of Cube3D v0.5, a text-to-3D mesh generative model. Quantized via RTN W4A16 (group_size=128) using torchao, it cuts peak VRAM from 25.4 GB → 14.3 GB (44%↓) while maintaining the same inference speed and comparable shape fidelity — enabling 3D shape generation on much smaller, more accessible GPUs.

	BF16 + Engine	BF16 + EngineFast	INT4 + EngineFast
🎮 Peak VRAM	21.7 GB	25.4 GB	14.3 GB (44%↓) ✨
📦 Setup time	19.4 s	206.9 s	25.1 s (88%↓)
⏱️ Latency	90.9 s	15.0 s	14.2 s

💡 The 44% VRAM reduction means this model now fits on a single 16 GB GPU (e.g. NVIDIA L4, A10 etc.), bringing high-quality text-to-3D generation to individual researchers and end-user hardware.

Original BF16 vs Quantized INT4 Comparisons:

A. Easy Categories (7)

B. Medium Categories (6)

C. Complex Categories (2)

Cube3D v0.5 — RTN W4A16 INT4 (torchao)

Post-training quantized version of Roblox/cube3d-v0.5, a text-to-3D mesh generative model.
Quantization method: RTN W4A16, group_size=128, via torchao int4_weight_only.

What's in this repo

File	Size	Description
`shape_gpt_rtn_int4_g128.pt`	1.26 GB	INT4 quantized GPT weights (torchao pickle)
`shape_tokenizer.safetensors`	~1.10 GB	VQ-VAE decoder — BF16, unchanged from base model
`open_model_v0.5.yaml`	tiny	Model architecture config
`quant_config.json`	tiny	Quantization metadata

The BF16 GPT weights (shape_gpt.safetensors) are not included here — they live in the parent repo and are only needed to reconstruct the model skeleton for loading.

Benchmark (NVIDIA A100-SXM4-40GB, 15-categories)

Shape Quality (Chamfer Distance, 15 categories, 170 prompts):

Median CD: 67.9 × 10⁻³

Best categories: vehicle_land (41.4), geometric_primitive (46.5), animal_wild (53.8).
Complex categories: symmetry_topology (205.8), abstract_mathematical (167.9) — high variance: RTN INT4 rounding hurts topologically complex shapes.

Category	Mean	Std	n
Easy (CD × 10⁻³ < 75)
vehicle_land	41.4	21.1	10
geometric_primitive	46.5	25.8	10
animal_wild	53.8	21.2	10
animal_domestic	56.5	21.2	10
tool_hardware	66.7	44.7	10
furniture	70.4	34.2	10
musical_instrument	72.5	45.7	10
Medium (CD × 10⁻³ 75–100)
vehicle_air_water	75.3	36.1	10
fine_detail	79.2	54.8	10
visualization_stylized	85.0	46.8	30
electronics	92.2	50.1	10
architecture	92.8	50.0	10
nature_plant	98.2	44.0	10
Complex (CD × 10⁻³ > 100)
abstract_mathematical	167.9	165.1	10
symmetry_topology	205.8	242.7	10

Requirements

torch==2.10.0+cu128
torchvision==0.25.0+cu128
torchaudio==2.10.0
torchao==0.10.0

The .pt file is a torchao pickle, torchao enables kernel-supported INT4 inference.

Usage

Please see the Google Colab tutorial.

Quantization details

Method: Round-to-nearest (RTN)
Precision: W4A16 - weights INT4, activations BF16
Quantized INT4 layers: 279 / 282
Skipped layers: shape_proj (in_features=16, < group size), lm_head (out=4099, output head), bbox_proj
Torchao Quantization Group size: 128

Citation

@article{roblox2025cube,
  title={Cube: A Roblox View of 3D Intelligence},
  author={Roblox},
  journal={arXiv preprint arXiv:2503.15475},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Text-to-3D

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TrNi/efficient-cube3d

Base model

Roblox/cube3d-v0.5

Quantized

(1)

this model

Paper for TrNi/efficient-cube3d

Cube: A Roblox View of 3D Intelligence

Paper • 2503.15475 • Published Mar 19, 2025 • 31