๐ First INT4 Quantized Cube3D โ Run on Half the VRAM
Presenting the first INT4 quantized version of Cube3D v0.5, a text-to-3D mesh generative model. Quantized via RTN W4A16 (group_size=128) using torchao, it cuts peak VRAM from 25.4 GB โ 14.3 GB (44%โ) while maintaining the same inference speed and comparable shape fidelity โ enabling 3D shape generation on much smaller, more accessible GPUs.
| BF16 + Engine | BF16 + EngineFast | INT4 + EngineFast | |
|---|---|---|---|
| ๐ฎ Peak VRAM | 21.7 GB | 25.4 GB | 14.3 GB (44%โ) โจ |
| ๐ฆ Setup time | 19.4 s | 206.9 s | 25.1 s (88%โ) |
| โฑ๏ธ Latency | 90.9 s | 15.0 s | 14.2 s |
๐ก The 44% VRAM reduction means this model now fits on a single 16 GB GPU (e.g. NVIDIA L4, A10 etc.), bringing high-quality text-to-3D generation to individual researchers and end-user hardware.
Original BF16 vs Quantized INT4 Comparisons:
A. Easy Categories (7)
B. Medium Categories (6)
C. Complex Categories (2)
Cube3D v0.5 โ RTN W4A16 INT4 (torchao)
Post-training quantized version of Roblox/cube3d-v0.5, a text-to-3D mesh generative model.
Quantization method: RTN W4A16, group_size=128, via torchao int4_weight_only.
What's in this repo
| File | Size | Description |
|---|---|---|
shape_gpt_rtn_int4_g128.pt |
1.26 GB | INT4 quantized GPT weights (torchao pickle) |
shape_tokenizer.safetensors |
~1.10 GB | VQ-VAE decoder โ BF16, unchanged from base model |
open_model_v0.5.yaml |
tiny | Model architecture config |
quant_config.json |
tiny | Quantization metadata |
The BF16 GPT weights (shape_gpt.safetensors) are not included here โ they live in the parent repo and are only needed to reconstruct the model skeleton for loading.
Benchmark (NVIDIA A100-SXM4-40GB, 15-categories)
Shape Quality (Chamfer Distance, 15 categories, 170 prompts):
Median CD: 67.9 ร 10โปยณ
Best categories: vehicle_land (41.4), geometric_primitive (46.5), animal_wild (53.8).
Complex categories: symmetry_topology (205.8), abstract_mathematical (167.9) โ high variance: RTN INT4 rounding hurts topologically complex shapes.
| Category | Mean | Std | n |
|---|---|---|---|
| Easy (CD ร 10โปยณ < 75) | |||
| vehicle_land | 41.4 | 21.1 | 10 |
| geometric_primitive | 46.5 | 25.8 | 10 |
| animal_wild | 53.8 | 21.2 | 10 |
| animal_domestic | 56.5 | 21.2 | 10 |
| tool_hardware | 66.7 | 44.7 | 10 |
| furniture | 70.4 | 34.2 | 10 |
| musical_instrument | 72.5 | 45.7 | 10 |
| Medium (CD ร 10โปยณ 75โ100) | |||
| vehicle_air_water | 75.3 | 36.1 | 10 |
| fine_detail | 79.2 | 54.8 | 10 |
| visualization_stylized | 85.0 | 46.8 | 30 |
| electronics | 92.2 | 50.1 | 10 |
| architecture | 92.8 | 50.0 | 10 |
| nature_plant | 98.2 | 44.0 | 10 |
| Complex (CD ร 10โปยณ > 100) | |||
| abstract_mathematical | 167.9 | 165.1 | 10 |
| symmetry_topology | 205.8 | 242.7 | 10 |
Requirements
torch==2.10.0+cu128
torchvision==0.25.0+cu128
torchaudio==2.10.0
torchao==0.10.0
The .pt file is a torchao pickle, torchao enables kernel-supported INT4 inference.
Usage
Please see the Google Colab tutorial.
Quantization details
- Method: Round-to-nearest (RTN)
- Precision: W4A16 - weights INT4, activations BF16
- Quantized INT4 layers: 279 / 282
- Skipped layers:
shape_proj(in_features=16, < group size),lm_head(out=4099, output head),bbox_proj - Torchao Quantization Group size: 128
Citation
@article{roblox2025cube,
title={Cube: A Roblox View of 3D Intelligence},
author={Roblox},
journal={arXiv preprint arXiv:2503.15475},
year={2025}
}
Model tree for TrNi/efficient-cube3d
Base model
Roblox/cube3d-v0.5

