Img2CAD: Reverse Engineering 3D CAD Models from Images

This repository contains the model checkpoints for Img2CAD, a novel framework for reverse engineering 3D CAD models from single-view images.

Model Overview

Img2CAD uses a two-stage approach:

  1. LlamaFT (Stage 1): A fine-tuned Llama-3.2-11B-Vision model that predicts discrete CAD command structures from images
  2. TrAssembler (Stage 2): A transformer-based network that predicts continuous CAD parameters using Gaussian Mixture Flow

Available Checkpoints

LlamaFT Adapters (LoRA)

Category Path
Chair llamaft/chair/
Table llamaft/table/
Storage Furniture llamaft/storagefurniture/

TrAssembler Checkpoints

Category Path
Chair trassembler/chair/
Table trassembler/table/
Storage Furniture trassembler/storagefurniture/

Usage

Stage 1: LlamaFT (Discrete Structure Prediction)

from transformers import AutoProcessor, AutoModelForVision2Seq, BitsAndBytesConfig
from huggingface_hub import hf_hub_download
from PIL import Image
import torch

# Download adapter
adapter_path = hf_hub_download(
    repo_id="qq456cvb/img2cad",
    filename="llamaft/chair/adapter_model.safetensors"
)
adapter_dir = str(Path(adapter_path).parent)

# Load base model
model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, 
    bnb_4bit_use_double_quant=True, 
    bnb_4bit_quant_type="nf4", 
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config
)
processor = AutoProcessor.from_pretrained(model_id)

# Load adapter
model.load_adapter(adapter_dir)
model.eval()

# Inference
image = Image.open("your_image.png").convert("RGB")
# ... see full inference code in the repository

Stage 2: TrAssembler (Continuous Parameter Prediction)

from omegaconf import OmegaConf
from huggingface_hub import hf_hub_download
import torch
import numpy as np

# Download checkpoint and config
ckpt_path = hf_hub_download(
    repo_id="qq456cvb/img2cad",
    filename="trassembler/chair/last.ckpt"
)
config_path = hf_hub_download(
    repo_id="qq456cvb/img2cad",
    filename="trassembler/chair/config.yaml"
)

# Load model (requires Img2CAD repository)
from TrAssembler.model import GMFlowModel

config = OmegaConf.load(config_path)
model = GMFlowModel.load_from_checkpoint(
    ckpt_path,
    args=config,
    embed_dim=config.network.embed_dim,
    num_heads=config.network.num_heads,
    dropout=config.network.dropout,
    bias=True,
    scaling_factor=1.,
    args_range=np.array([-1., 1.])
).cuda().eval()

# Generate CAD parameters
# pred_args = model.sample(batch)

Paper

Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization

Architecture

LlamaFT

  • Base model: Llama-3.2-11B-Vision-Instruct
  • Fine-tuning: QLoRA (rank=16, alpha=16)
  • Target modules: q_proj, v_proj

TrAssembler

  • Image encoder: DINOv2-B (frozen)
  • Sketch transformer: 6-layer decoder
  • Part transformer: 6-layer decoder
  • Output: Gaussian Mixture Flow for continuous parameters

Citation

@inproceedings{you2025img2cad,
  title={Img2cad: Reverse engineering 3d cad models from images through vlm-assisted conditional factorization},
  author={You, Yang and Uy, Mikaela Angelina and Han, Jiaqi and Thomas, Rahul and Zhang, Haotong and Du, Yi and Chen, Hansheng and Engelmann, Francis and You, Suya and Guibas, Leonidas},
  booktitle={Proceedings of the SIGGRAPH Asia 2025 Conference Papers},
  pages={1--12},
  year={2025}
}

License

This project is released under the MIT License.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for qq456cvb/img2cad

Adapter
(344)
this model

Dataset used to train qq456cvb/img2cad

Paper for qq456cvb/img2cad