Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization
Paper
•
2408.01437
•
Published
•
1
This repository contains the model checkpoints for Img2CAD, a novel framework for reverse engineering 3D CAD models from single-view images.
Img2CAD uses a two-stage approach:
| Category | Path |
|---|---|
| Chair | llamaft/chair/ |
| Table | llamaft/table/ |
| Storage Furniture | llamaft/storagefurniture/ |
| Category | Path |
|---|---|
| Chair | trassembler/chair/ |
| Table | trassembler/table/ |
| Storage Furniture | trassembler/storagefurniture/ |
from transformers import AutoProcessor, AutoModelForVision2Seq, BitsAndBytesConfig
from huggingface_hub import hf_hub_download
from PIL import Image
import torch
# Download adapter
adapter_path = hf_hub_download(
repo_id="qq456cvb/img2cad",
filename="llamaft/chair/adapter_model.safetensors"
)
adapter_dir = str(Path(adapter_path).parent)
# Load base model
model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForVision2Seq.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
quantization_config=bnb_config
)
processor = AutoProcessor.from_pretrained(model_id)
# Load adapter
model.load_adapter(adapter_dir)
model.eval()
# Inference
image = Image.open("your_image.png").convert("RGB")
# ... see full inference code in the repository
from omegaconf import OmegaConf
from huggingface_hub import hf_hub_download
import torch
import numpy as np
# Download checkpoint and config
ckpt_path = hf_hub_download(
repo_id="qq456cvb/img2cad",
filename="trassembler/chair/last.ckpt"
)
config_path = hf_hub_download(
repo_id="qq456cvb/img2cad",
filename="trassembler/chair/config.yaml"
)
# Load model (requires Img2CAD repository)
from TrAssembler.model import GMFlowModel
config = OmegaConf.load(config_path)
model = GMFlowModel.load_from_checkpoint(
ckpt_path,
args=config,
embed_dim=config.network.embed_dim,
num_heads=config.network.num_heads,
dropout=config.network.dropout,
bias=True,
scaling_factor=1.,
args_range=np.array([-1., 1.])
).cuda().eval()
# Generate CAD parameters
# pred_args = model.sample(batch)
Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization
@inproceedings{you2025img2cad,
title={Img2cad: Reverse engineering 3d cad models from images through vlm-assisted conditional factorization},
author={You, Yang and Uy, Mikaela Angelina and Han, Jiaqi and Thomas, Rahul and Zhang, Haotong and Du, Yi and Chen, Hansheng and Engelmann, Francis and You, Suya and Guibas, Leonidas},
booktitle={Proceedings of the SIGGRAPH Asia 2025 Conference Papers},
pages={1--12},
year={2025}
}
This project is released under the MIT License.
Base model
meta-llama/Llama-3.2-11B-Vision-Instruct