๐Ÿš€ First INT4 Quantized Cube3D โ€” Run on Half the VRAM

Presenting the first INT4 quantized version of Cube3D v0.5, a text-to-3D mesh generative model. Quantized via RTN W4A16 (group_size=128) using torchao, it cuts peak VRAM from 25.4 GB โ†’ 14.3 GB (44%โ†“) while maintaining the same inference speed and comparable shape fidelity โ€” enabling 3D shape generation on much smaller, more accessible GPUs.

BF16 + Engine BF16 + EngineFast INT4 + EngineFast
๐ŸŽฎ Peak VRAM 21.7 GB 25.4 GB 14.3 GB (44%โ†“) โœจ
๐Ÿ“ฆ Setup time 19.4 s 206.9 s 25.1 s (88%โ†“)
โฑ๏ธ Latency 90.9 s 15.0 s 14.2 s

๐Ÿ’ก The 44% VRAM reduction means this model now fits on a single 16 GB GPU (e.g. NVIDIA L4, A10 etc.), bringing high-quality text-to-3D generation to individual researchers and end-user hardware.

Original BF16 vs Quantized INT4 Comparisons:

A. Easy Categories (7)

Easy categories

B. Medium Categories (6)

Medium categories

C. Complex Categories (2)

Complex categories

Cube3D v0.5 โ€” RTN W4A16 INT4 (torchao)

Post-training quantized version of Roblox/cube3d-v0.5, a text-to-3D mesh generative model.
Quantization method: RTN W4A16, group_size=128, via torchao int4_weight_only.

What's in this repo

File Size Description
shape_gpt_rtn_int4_g128.pt 1.26 GB INT4 quantized GPT weights (torchao pickle)
shape_tokenizer.safetensors ~1.10 GB VQ-VAE decoder โ€” BF16, unchanged from base model
open_model_v0.5.yaml tiny Model architecture config
quant_config.json tiny Quantization metadata

The BF16 GPT weights (shape_gpt.safetensors) are not included here โ€” they live in the parent repo and are only needed to reconstruct the model skeleton for loading.

Benchmark (NVIDIA A100-SXM4-40GB, 15-categories)

Shape Quality (Chamfer Distance, 15 categories, 170 prompts):

Median CD: 67.9 ร— 10โปยณ

Best categories: vehicle_land (41.4), geometric_primitive (46.5), animal_wild (53.8).
Complex categories: symmetry_topology (205.8), abstract_mathematical (167.9) โ€” high variance: RTN INT4 rounding hurts topologically complex shapes.

Category Mean Std n
Easy (CD ร— 10โปยณ < 75)
vehicle_land 41.4 21.1 10
geometric_primitive 46.5 25.8 10
animal_wild 53.8 21.2 10
animal_domestic 56.5 21.2 10
tool_hardware 66.7 44.7 10
furniture 70.4 34.2 10
musical_instrument 72.5 45.7 10
Medium (CD ร— 10โปยณ 75โ€“100)
vehicle_air_water 75.3 36.1 10
fine_detail 79.2 54.8 10
visualization_stylized 85.0 46.8 30
electronics 92.2 50.1 10
architecture 92.8 50.0 10
nature_plant 98.2 44.0 10
Complex (CD ร— 10โปยณ > 100)
abstract_mathematical 167.9 165.1 10
symmetry_topology 205.8 242.7 10

Requirements

torch==2.10.0+cu128
torchvision==0.25.0+cu128
torchaudio==2.10.0
torchao==0.10.0

The .pt file is a torchao pickle, torchao enables kernel-supported INT4 inference.

Usage

Please see the Google Colab tutorial.

Quantization details

  • Method: Round-to-nearest (RTN)
  • Precision: W4A16 - weights INT4, activations BF16
  • Quantized INT4 layers: 279 / 282
  • Skipped layers: shape_proj (in_features=16, < group size), lm_head (out=4099, output head), bbox_proj
  • Torchao Quantization Group size: 128

Citation

@article{roblox2025cube,
  title={Cube: A Roblox View of 3D Intelligence},
  author={Roblox},
  journal={arXiv preprint arXiv:2503.15475},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for TrNi/efficient-cube3d

Quantized
(1)
this model

Paper for TrNi/efficient-cube3d