pookiefoof commited on
Commit
c8044f5
·
verified ·
1 Parent(s): 9ef5f16

Add model card

Browse files
Files changed (1) hide show
  1. README.md +140 -0
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ pipeline_tag: image-to-3d
5
+ tags:
6
+ - lego
7
+ - 3d-generation
8
+ - autoregressive
9
+ - transformer
10
+ - llama
11
+ - dinov2
12
+ - clip
13
+ - siggraph-asia-2025
14
+ ---
15
+
16
+ # LegoACE: Autoregressive Construction Engine for Expressive LEGO® Assemblies
17
+
18
+ Official model weights for **LegoACE**, presented at **SIGGRAPH Asia 2025**.
19
+
20
+ LegoACE is an autoregressive transformer that generates LEGO® assemblies as
21
+ sequences of placed bricks. This repository hosts two pretrained variants:
22
+
23
+ | Subfolder | Conditioning | Encoder | Training steps |
24
+ |-----------|--------------|---------|----------------|
25
+ | `mv/` | Multi-view images (4 views) | [DINOv2-base](https://huggingface.co/facebook/dinov2-base) | 520K |
26
+ | `text/` | Text descriptions | [CLIP ViT-B/32](https://huggingface.co/openai/clip-vit-base-patch32) | 210K |
27
+
28
+ - 📄 Paper: [LegoACE @ SIGGRAPH Asia 2025](https://doi.org/10.1145/3757377.3763881)
29
+ - 💻 Code: [VAST-AI-Research/LegoACE](https://github.com/VAST-AI-Research/LegoACE)
30
+ - 📊 Architecture: 32-layer Llama-style transformer, hidden size 768, vocab ~16K
31
+
32
+ ---
33
+
34
+ ## Quick start
35
+
36
+ > Full inference pipeline (LDR tokenizer, multi-view rendering, LDR → GLB
37
+ > conversion) lives in the [GitHub repository](https://github.com/VAST-AI-Research/LegoACE).
38
+ > The snippets below show only how to load the weights.
39
+
40
+ ```bash
41
+ git clone https://github.com/VAST-AI-Research/LegoACE.git
42
+ cd LegoACE
43
+ pip install -e .
44
+ ```
45
+
46
+ ### Multi-view image conditioned (recommended)
47
+
48
+ ```python
49
+ from model.llama_image_condition import ImageConditionModel
50
+
51
+ model = ImageConditionModel.from_pretrained("VAST-AI/LegoACE", subfolder="mv").to("cuda")
52
+ ```
53
+
54
+ End-to-end usage with the `dataset/MVNpzDataset.py` loader and Blender-based
55
+ GLB export is documented in the GitHub README:
56
+
57
+ ```bash
58
+ python inference/inference_multi_view.py \
59
+ --ckpt_dir VAST-AI/LegoACE \
60
+ --dataset_name <your_dataset> \
61
+ --dataset_class dataset.MVNpzDataset.MVNpzDataset \
62
+ --save_dir ./outputs/inference \
63
+ --save_name mv-demo \
64
+ --infer_number 100 --batch_size 4 --repeat 4 --dataset_split val
65
+ ```
66
+
67
+ ### Text conditioned
68
+
69
+ ```python
70
+ from model.llama_text_condition import TextConditionModel
71
+
72
+ model = TextConditionModel.from_pretrained("VAST-AI/LegoACE", subfolder="text").to("cuda")
73
+ ```
74
+
75
+ ```bash
76
+ python inference/inference_text_condition.py \
77
+ --ckpt_dir VAST-AI/LegoACE \
78
+ --dataset_name <your_dataset> \
79
+ --save_dir ./outputs/inference --save_name text-demo \
80
+ --prompts "A red sports car" "A modern brick bed" "A bridge over a river"
81
+ ```
82
+
83
+ ---
84
+
85
+ ## Outputs
86
+
87
+ Each generation step emits a quintuple `(x, y, z, rotation_id, brick_type_id)`.
88
+ The full pipeline converts those token sequences into:
89
+
90
+ 1. **LDR** — text-format LEGO instructions (LDraw)
91
+ 2. **GLB** — 3D mesh via Blender + [ImportLDraw](https://github.com/TobyLobster/ImportLDraw)
92
+ 3. **Normal maps** — pyrender renderings of the assembled model
93
+
94
+ LegoACE supports an LDR vocabulary covering 28 common brick types and 20
95
+ discrete rotation classes; see [`utils/brick_ids.py`](https://github.com/VAST-AI-Research/LegoACE/blob/main/utils/brick_ids.py).
96
+
97
+ ---
98
+
99
+ ## Intended uses & limitations
100
+
101
+ **Intended uses**
102
+ - Research on autoregressive 3D / LEGO® generative models.
103
+ - Generating LEGO assemblies for academic and creative exploration.
104
+
105
+ **Limitations**
106
+ - Outputs are restricted to the 28-brick vocabulary used in training.
107
+ - Quality depends on prompt phrasing (text) or image quality (multi-view).
108
+ - The model has been trained primarily on small/medium-scale assemblies and
109
+ may produce structurally unstable or non-buildable arrangements.
110
+ - Generation requires the LDR tokenizer files (`*_dat_dict.json`,
111
+ `*_rot_dict.json`) that ship with the dataset, not with these weights.
112
+
113
+ ---
114
+
115
+ ## Citation
116
+
117
+ ```bibtex
118
+ @inproceedings{xu2025legoace,
119
+ author = {Hao Xu and Yuqing Zhang and Yiqian Wu and Xinyang Zheng and
120
+ Yutao Liu and Xiangjun Tang and Yunhan Yang and Ding Liang and
121
+ Yingtian Liu and Yuanchen Guo and Yanpei Cao and Xiaogang Jin},
122
+ title = {LegoACE: Autoregressive Construction Engine for Expressive LEGO{\textregistered}
123
+ Assemblies},
124
+ booktitle = {Proceedings of the {SIGGRAPH} Asia 2025 Conference Papers},
125
+ publisher = {{ACM}},
126
+ year = {2025},
127
+ pages = {40:1--40:11},
128
+ doi = {10.1145/3757377.3763881},
129
+ url = {https://doi.org/10.1145/3757377.3763881}
130
+ }
131
+ ```
132
+
133
+ ---
134
+
135
+ ## License
136
+
137
+ Released under the [MIT License](https://github.com/VAST-AI-Research/LegoACE/blob/main/LICENSE).
138
+
139
+ LEGO® is a trademark of the LEGO Group, which does not sponsor, authorize, or
140
+ endorse this project.