Image-to-3D
Transformers
Safetensors
lego
3d-generation
autoregressive
transformer
llama
dinov2
clip
siggraph-asia-2025
Instructions to use VAST-AI/LegoACE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use VAST-AI/LegoACE with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("VAST-AI/LegoACE", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| library_name: transformers | |
| pipeline_tag: image-to-3d | |
| tags: | |
| - lego | |
| - 3d-generation | |
| - autoregressive | |
| - transformer | |
| - llama | |
| - dinov2 | |
| - clip | |
| - siggraph-asia-2025 | |
| # LegoACE: Autoregressive Construction Engine for Expressive LEGO® Assemblies | |
| Official model weights for **LegoACE**, presented at **SIGGRAPH Asia 2025**. | |
| LegoACE is an autoregressive transformer that generates LEGO® assemblies as | |
| sequences of placed bricks. This repository hosts two pretrained variants: | |
| | Subfolder | Conditioning | Encoder | Training steps | | |
| |-----------|--------------|---------|----------------| | |
| | `mv/` | Multi-view images (4 views) | [DINOv2-base](https://huggingface.co/facebook/dinov2-base) | 520K | | |
| | `text/` | Text descriptions | [CLIP ViT-B/32](https://huggingface.co/openai/clip-vit-base-patch32) | 210K | | |
| - 📄 Paper: [LegoACE @ SIGGRAPH Asia 2025](https://doi.org/10.1145/3757377.3763881) | |
| - 💻 Code: [VAST-AI-Research/LegoACE](https://github.com/VAST-AI-Research/LegoACE) | |
| - 📊 Architecture: 32-layer Llama-style transformer, hidden size 768, vocab ~16K | |
| --- | |
| ## Quick start | |
| > Full inference pipeline (LDR tokenizer, multi-view rendering, LDR → GLB | |
| > conversion) lives in the [GitHub repository](https://github.com/VAST-AI-Research/LegoACE). | |
| > The snippets below show only how to load the weights. | |
| ```bash | |
| git clone https://github.com/VAST-AI-Research/LegoACE.git | |
| cd LegoACE | |
| pip install -e . | |
| ``` | |
| ### Multi-view image conditioned (recommended) | |
| ```python | |
| from model.llama_image_condition import ImageConditionModel | |
| model = ImageConditionModel.from_pretrained("VAST-AI/LegoACE", subfolder="mv").to("cuda") | |
| ``` | |
| End-to-end usage with the `dataset/MVNpzDataset.py` loader and Blender-based | |
| GLB export is documented in the GitHub README: | |
| ```bash | |
| python inference/inference_multi_view.py \ | |
| --ckpt_dir VAST-AI/LegoACE \ | |
| --dataset_name <your_dataset> \ | |
| --dataset_class dataset.MVNpzDataset.MVNpzDataset \ | |
| --save_dir ./outputs/inference \ | |
| --save_name mv-demo \ | |
| --infer_number 100 --batch_size 4 --repeat 4 --dataset_split val | |
| ``` | |
| ### Text conditioned | |
| ```python | |
| from model.llama_text_condition import TextConditionModel | |
| model = TextConditionModel.from_pretrained("VAST-AI/LegoACE", subfolder="text").to("cuda") | |
| ``` | |
| ```bash | |
| python inference/inference_text_condition.py \ | |
| --ckpt_dir VAST-AI/LegoACE \ | |
| --dataset_name <your_dataset> \ | |
| --save_dir ./outputs/inference --save_name text-demo \ | |
| --prompts "A red sports car" "A modern brick bed" "A bridge over a river" | |
| ``` | |
| --- | |
| ## Outputs | |
| Each generation step emits a quintuple `(x, y, z, rotation_id, brick_type_id)`. | |
| The full pipeline converts those token sequences into: | |
| 1. **LDR** — text-format LEGO instructions (LDraw) | |
| 2. **GLB** — 3D mesh via Blender + [ImportLDraw](https://github.com/TobyLobster/ImportLDraw) | |
| 3. **Normal maps** — pyrender renderings of the assembled model | |
| LegoACE supports an LDR vocabulary covering 28 common brick types and 20 | |
| discrete rotation classes; see [`utils/brick_ids.py`](https://github.com/VAST-AI-Research/LegoACE/blob/main/utils/brick_ids.py). | |
| --- | |
| ## Intended uses & limitations | |
| **Intended uses** | |
| - Research on autoregressive 3D / LEGO® generative models. | |
| - Generating LEGO assemblies for academic and creative exploration. | |
| **Limitations** | |
| - Outputs are restricted to the 28-brick vocabulary used in training. | |
| - Quality depends on prompt phrasing (text) or image quality (multi-view). | |
| - The model has been trained primarily on small/medium-scale assemblies and | |
| may produce structurally unstable or non-buildable arrangements. | |
| - Generation requires the LDR tokenizer files (`*_dat_dict.json`, | |
| `*_rot_dict.json`) that ship with the dataset, not with these weights. | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @inproceedings{xu2025legoace, | |
| author = {Hao Xu and Yuqing Zhang and Yiqian Wu and Xinyang Zheng and | |
| Yutao Liu and Xiangjun Tang and Yunhan Yang and Ding Liang and | |
| Yingtian Liu and Yuanchen Guo and Yanpei Cao and Xiaogang Jin}, | |
| title = {LegoACE: Autoregressive Construction Engine for Expressive LEGO{\textregistered} | |
| Assemblies}, | |
| booktitle = {Proceedings of the {SIGGRAPH} Asia 2025 Conference Papers}, | |
| publisher = {{ACM}}, | |
| year = {2025}, | |
| pages = {40:1--40:11}, | |
| doi = {10.1145/3757377.3763881}, | |
| url = {https://doi.org/10.1145/3757377.3763881} | |
| } | |
| ``` | |
| --- | |
| ## License | |
| Released under the [MIT License](https://github.com/VAST-AI-Research/LegoACE/blob/main/LICENSE). | |
| LEGO® is a trademark of the LEGO Group, which does not sponsor, authorize, or | |
| endorse this project. | |