BIM-CLIP Model Weights

Figure1: Overview of the BIM-CLIP framework.

Figure2: BIM-CLIP workflow and downstream applications.

Given heterogeneous inputs the framework encodes each modality through dedicated encoders and aligns them within a shared semantic embedding space via contrastive learning. After alignment-preserving fine-tuning, the learned representations support two practical BIM downstream tasks.

Model weights for BIM-CLIP: Language-Guided Multimodal Representation Learning for BIM Component Recognition.

Haining Meng, Haoyang Dong, Mingsong Yang, Xing Fan, Xinhong Hei
Xi'an University of Technology

📄 Paper (Preprint) | 💻 GitHub | 🗂️ BIMCompNet Dataset

Repository Structure

bim-clip-weights/
│
├── README.md
│
├── BIMCompNet/                         # Models trained on BIMCompNet
│   ├── multimodal/
│   │   ├── best_100.mdl                # BIMCompNet-100 (42 classes), multimodal
│   │   ├── best_500.mdl                # BIMCompNet-500 (31 classes), multimodal
│   │   └── best_1000.mdl              # BIMCompNet-1000 (24 classes), multimodal
│   └── single_modal/
│       ├── best_100.mdl                # BIMCompNet-100 (42 classes), single modality
│       ├── best_500.mdl                # BIMCompNet-500 (31 classes), single modality
│       └── best_1000.mdl              # BIMCompNet-1000 (24 classes), single modality
│
├── IFCNet/
│   └── best_ifcnet.mdl                # Multimodal, trained on IFCNet (20 classes)
│
├── ModelNet/
│   ├── best_10.mdl                    # Multimodal, ModelNet-10
│   ├── best_40.mdl                    # Multimodal, ModelNet-40
│   ├── ModelNet10.zip                 # ModelNet-10 extended with PC + multi-view modalities
│   └── ModelNet40.zip                 # ModelNet-40 extended with PC + multi-view modalities
│
└── ULIP2/
    ├── best_ulip2_1000.mdl            # ULIP-2 fine-tuned on BIMCompNet-1000
    └── ULIP-2-PointBERT-8k-xyz-pc-slip_vit_b-objaverse-pretrained.pt  # Official pretrained weights (866 MB)

Model Summary

File	Architecture	Training Set	Classes	Acc (%)	F1 (%)
`BIMCompNet/multimodal/best_100.mdl`	BIM-CLIP (CMA)	BIMCompNet-100	42	87.38	87.44
`BIMCompNet/multimodal/best_500.mdl`	BIM-CLIP (CMA)	BIMCompNet-500	31	91.35	91.28
`BIMCompNet/multimodal/best_1000.mdl`	BIM-CLIP (CMA)	BIMCompNet-1000	24	91.79	91.83
`BIMCompNet/single_modal/best_100.mdl`	BIM-CLIP (single modality)	BIMCompNet-100	42	—	—
`BIMCompNet/single_modal/best_500.mdl`	BIM-CLIP (single modality)	BIMCompNet-500	31	—	—
`BIMCompNet/single_modal/best_1000.mdl`	BIM-CLIP (single modality)	BIMCompNet-1000	24	88.69	87.90
`IFCNet/best_ifcnet.mdl`	BIM-CLIP (CMA)	IFCNet	20	91.00	90.39
`ModelNet/best_10.mdl`	BIM-CLIP (CMA)	ModelNet-10	10	95.36	95.25*
`ModelNet/best_40.mdl`	BIM-CLIP (CMA)	ModelNet-40	40	92.22	90.34*
`ULIP2/best_ulip2_1000.mdl`	ULIP-2	BIMCompNet-1000	24	90.98	91.02
`ULIP2/ULIP-2-PointBERT-…-pretrained.pt`	ULIP-2 (official)	Objaverse	—	—	—

mAP reported for ModelNet. — indicates metrics not separately reported in the paper.

Usage

Load and Run Evaluation

Clone the GitHub repo, then:

# Evaluate BIMCompNet-1000 multimodal
python bimclip.py --mode eval --data_type MULTI_MODAL \
  --data_root /path/to/BIMCompNet --index_root /path/to/index \
  --set_size 1000 \
  --model_path /path/to/BIMCompNet/multimodal/best_1000.mdl \
  --embeddings_path embeddings.pt \
  --yaml_path ./描述信息.yaml \
  --output_dir ./results

# Evaluate IFCNet multimodal
python bimclip.py --mode eval --data_type MULTI_MODAL \
  --ifcnet_root /path/to/IFCNetCorePly/IFCNetCore \
  --model_path /path/to/IFCNet/best_ifcnet.mdl \
  --embeddings_path ifcnet_embeddings.pt \
  --yaml_path ./描述信息.yaml \
  --output_dir ./results

Embeddings

Text embedding files are included in the GitHub repository:

File	Classes	Dataset
`embeddings.pt`	57	BIMCompNet (all categories)
`ifcnet_embeddings.pt`	20	IFCNet
`model_net_10_embeddings.pt`	10	ModelNet-10
`model_net_40_embeddings.pt`	40	ModelNet-40

Use the matching embeddings file for each dataset. Do not mix across datasets.

Dataset access

BIMCompNet is hosted by the 606 Lab at Xi'an University of Technology. Visit the link below to apply for access or download:

👉 https://bimcompnet-606lab.xaut.edu.cn/

ModelNet multimodal extension

ModelNet10.zip and ModelNet40.zip contain the original ModelNet meshes extended with point clouds and multi-view images, constructed using our multimodal data pipeline. The directory layout inside each zip is:

ModelNet{10|40}/
└── {class}/
    ├── train/
    │   ├── obj/          # Original mesh (.obj)
    │   ├── ply/          # Point cloud sampled from mesh (1024 pts, .ply)
    │   └── png/
    │       └── {sample}/
    │           └── Edges/   # 12 edge-rendered views (0.png – 11.png)
    └── test/
        └── (same structure)

Point clouds are uniformly sampled from the mesh surface (1024 points per object). Multi-view images are edge-rendered from 12 fixed viewpoints following the camera placement strategy used in BIMCompNet.

Third-Party Weights (ULIP-2 baseline)

The official ULIP-2 pretrained PointBERT weights (866 MB) are included in this repository at:

ULIP2/ULIP-2-PointBERT-8k-xyz-pc-slip_vit_b-objaverse-pretrained.pt

Alternatively, download directly from the original source:

https://huggingface.co/datasets/SFXX/ulip/resolve/main/ULIP-2/pretrained_models/ULIP-2-PointBERT-8k-xyz-pc-slip_vit_b-objaverse-pretrained.pt

Architecture Overview

BIM-CLIP uses three modality encoders (ViT for images, PointNet for point clouds, MeshNet for meshes), projects each into a shared 1536-dim language embedding space via contrastive alignment against text-embedding-ada-002 anchors, then fuses them through a Language-Guided Cross-Modal Attention (CMA) module. During fine-tuning, only the CMA module (10.62M parameters) is updated.

Citation

@article{meng2026bimclip,
  title={BIM-CLIP: Language-Guided Multimodal Representation Learning for BIM Component Recognition},
  author={Meng, Haining and Dong, Haoyang and Yang, Mingsong and Fan, Xing and Hei, Xinhong},
  journal={[to-be-updated upon acceptance]},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track