Image-to-3D
Transformers
Safetensors
lego
3d-generation
autoregressive
transformer
llama
dinov2
clip
siggraph-asia-2025
Instructions to use VAST-AI/LegoACE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use VAST-AI/LegoACE with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("VAST-AI/LegoACE", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 4,626 Bytes
c8044f5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
license: mit
library_name: transformers
pipeline_tag: image-to-3d
tags:
- lego
- 3d-generation
- autoregressive
- transformer
- llama
- dinov2
- clip
- siggraph-asia-2025
---
# LegoACE: Autoregressive Construction Engine for Expressive LEGO® Assemblies
Official model weights for **LegoACE**, presented at **SIGGRAPH Asia 2025**.
LegoACE is an autoregressive transformer that generates LEGO® assemblies as
sequences of placed bricks. This repository hosts two pretrained variants:
| Subfolder | Conditioning | Encoder | Training steps |
|-----------|--------------|---------|----------------|
| `mv/` | Multi-view images (4 views) | [DINOv2-base](https://huggingface.co/facebook/dinov2-base) | 520K |
| `text/` | Text descriptions | [CLIP ViT-B/32](https://huggingface.co/openai/clip-vit-base-patch32) | 210K |
- 📄 Paper: [LegoACE @ SIGGRAPH Asia 2025](https://doi.org/10.1145/3757377.3763881)
- 💻 Code: [VAST-AI-Research/LegoACE](https://github.com/VAST-AI-Research/LegoACE)
- 📊 Architecture: 32-layer Llama-style transformer, hidden size 768, vocab ~16K
---
## Quick start
> Full inference pipeline (LDR tokenizer, multi-view rendering, LDR → GLB
> conversion) lives in the [GitHub repository](https://github.com/VAST-AI-Research/LegoACE).
> The snippets below show only how to load the weights.
```bash
git clone https://github.com/VAST-AI-Research/LegoACE.git
cd LegoACE
pip install -e .
```
### Multi-view image conditioned (recommended)
```python
from model.llama_image_condition import ImageConditionModel
model = ImageConditionModel.from_pretrained("VAST-AI/LegoACE", subfolder="mv").to("cuda")
```
End-to-end usage with the `dataset/MVNpzDataset.py` loader and Blender-based
GLB export is documented in the GitHub README:
```bash
python inference/inference_multi_view.py \
--ckpt_dir VAST-AI/LegoACE \
--dataset_name <your_dataset> \
--dataset_class dataset.MVNpzDataset.MVNpzDataset \
--save_dir ./outputs/inference \
--save_name mv-demo \
--infer_number 100 --batch_size 4 --repeat 4 --dataset_split val
```
### Text conditioned
```python
from model.llama_text_condition import TextConditionModel
model = TextConditionModel.from_pretrained("VAST-AI/LegoACE", subfolder="text").to("cuda")
```
```bash
python inference/inference_text_condition.py \
--ckpt_dir VAST-AI/LegoACE \
--dataset_name <your_dataset> \
--save_dir ./outputs/inference --save_name text-demo \
--prompts "A red sports car" "A modern brick bed" "A bridge over a river"
```
---
## Outputs
Each generation step emits a quintuple `(x, y, z, rotation_id, brick_type_id)`.
The full pipeline converts those token sequences into:
1. **LDR** — text-format LEGO instructions (LDraw)
2. **GLB** — 3D mesh via Blender + [ImportLDraw](https://github.com/TobyLobster/ImportLDraw)
3. **Normal maps** — pyrender renderings of the assembled model
LegoACE supports an LDR vocabulary covering 28 common brick types and 20
discrete rotation classes; see [`utils/brick_ids.py`](https://github.com/VAST-AI-Research/LegoACE/blob/main/utils/brick_ids.py).
---
## Intended uses & limitations
**Intended uses**
- Research on autoregressive 3D / LEGO® generative models.
- Generating LEGO assemblies for academic and creative exploration.
**Limitations**
- Outputs are restricted to the 28-brick vocabulary used in training.
- Quality depends on prompt phrasing (text) or image quality (multi-view).
- The model has been trained primarily on small/medium-scale assemblies and
may produce structurally unstable or non-buildable arrangements.
- Generation requires the LDR tokenizer files (`*_dat_dict.json`,
`*_rot_dict.json`) that ship with the dataset, not with these weights.
---
## Citation
```bibtex
@inproceedings{xu2025legoace,
author = {Hao Xu and Yuqing Zhang and Yiqian Wu and Xinyang Zheng and
Yutao Liu and Xiangjun Tang and Yunhan Yang and Ding Liang and
Yingtian Liu and Yuanchen Guo and Yanpei Cao and Xiaogang Jin},
title = {LegoACE: Autoregressive Construction Engine for Expressive LEGO{\textregistered}
Assemblies},
booktitle = {Proceedings of the {SIGGRAPH} Asia 2025 Conference Papers},
publisher = {{ACM}},
year = {2025},
pages = {40:1--40:11},
doi = {10.1145/3757377.3763881},
url = {https://doi.org/10.1145/3757377.3763881}
}
```
---
## License
Released under the [MIT License](https://github.com/VAST-AI-Research/LegoACE/blob/main/LICENSE).
LEGO® is a trademark of the LEGO Group, which does not sponsor, authorize, or
endorse this project.
|