BIM-CLIP Model Weights

Figure1: Overview of the BIM-CLIP framework.

Figure1

Figure2: BIM-CLIP workflow and downstream applications.

Given heterogeneous inputs the framework encodes each modality through dedicated encoders and aligns them within a shared semantic embedding space via contrastive learning. After alignment-preserving fine-tuning, the learned representations support two practical BIM downstream tasks.

Figure3

Model weights for BIM-CLIP: Language-Guided Multimodal Representation Learning for BIM Component Recognition.

Haining Meng, Haoyang Dong, Mingsong Yang, Xing Fan, Xinhong Hei
Xi'an University of Technology

πŸ“„ Paper (Preprint) | πŸ’» GitHub | πŸ—‚οΈ BIMCompNet Dataset


Repository Structure

bim-clip-weights/
β”‚
β”œβ”€β”€ README.md
β”‚
β”œβ”€β”€ BIMCompNet/                         # Models trained on BIMCompNet
β”‚   β”œβ”€β”€ multimodal/
β”‚   β”‚   β”œβ”€β”€ best_100.mdl                # BIMCompNet-100 (42 classes), multimodal
β”‚   β”‚   β”œβ”€β”€ best_500.mdl                # BIMCompNet-500 (31 classes), multimodal
β”‚   β”‚   └── best_1000.mdl              # BIMCompNet-1000 (24 classes), multimodal
β”‚   └── single_modal/
β”‚       β”œβ”€β”€ best_100.mdl                # BIMCompNet-100 (42 classes), single modality
β”‚       β”œβ”€β”€ best_500.mdl                # BIMCompNet-500 (31 classes), single modality
β”‚       └── best_1000.mdl              # BIMCompNet-1000 (24 classes), single modality
β”‚
β”œβ”€β”€ IFCNet/
β”‚   └── best_ifcnet.mdl                # Multimodal, trained on IFCNet (20 classes)
β”‚
β”œβ”€β”€ ModelNet/
β”‚   β”œβ”€β”€ best_10.mdl                    # Multimodal, ModelNet-10
β”‚   β”œβ”€β”€ best_40.mdl                    # Multimodal, ModelNet-40
β”‚   β”œβ”€β”€ ModelNet10.zip                 # ModelNet-10 extended with PC + multi-view modalities
β”‚   └── ModelNet40.zip                 # ModelNet-40 extended with PC + multi-view modalities
β”‚
└── ULIP2/
    β”œβ”€β”€ best_ulip2_1000.mdl            # ULIP-2 fine-tuned on BIMCompNet-1000
    └── ULIP-2-PointBERT-8k-xyz-pc-slip_vit_b-objaverse-pretrained.pt  # Official pretrained weights (866 MB)

Model Summary

File Architecture Training Set Classes Acc (%) F1 (%)
BIMCompNet/multimodal/best_100.mdl BIM-CLIP (CMA) BIMCompNet-100 42 87.38 87.44
BIMCompNet/multimodal/best_500.mdl BIM-CLIP (CMA) BIMCompNet-500 31 91.35 91.28
BIMCompNet/multimodal/best_1000.mdl BIM-CLIP (CMA) BIMCompNet-1000 24 91.79 91.83
BIMCompNet/single_modal/best_100.mdl BIM-CLIP (single modality) BIMCompNet-100 42 β€” β€”
BIMCompNet/single_modal/best_500.mdl BIM-CLIP (single modality) BIMCompNet-500 31 β€” β€”
BIMCompNet/single_modal/best_1000.mdl BIM-CLIP (single modality) BIMCompNet-1000 24 88.69 87.90
IFCNet/best_ifcnet.mdl BIM-CLIP (CMA) IFCNet 20 91.00 90.39
ModelNet/best_10.mdl BIM-CLIP (CMA) ModelNet-10 10 95.36 95.25*
ModelNet/best_40.mdl BIM-CLIP (CMA) ModelNet-40 40 92.22 90.34*
ULIP2/best_ulip2_1000.mdl ULIP-2 BIMCompNet-1000 24 90.98 91.02
ULIP2/ULIP-2-PointBERT-…-pretrained.pt ULIP-2 (official) Objaverse β€” β€” β€”

mAP reported for ModelNet. β€” indicates metrics not separately reported in the paper.


Usage

Load and Run Evaluation

Clone the GitHub repo, then:

# Evaluate BIMCompNet-1000 multimodal
python bimclip.py --mode eval --data_type MULTI_MODAL \
  --data_root /path/to/BIMCompNet --index_root /path/to/index \
  --set_size 1000 \
  --model_path /path/to/BIMCompNet/multimodal/best_1000.mdl \
  --embeddings_path embeddings.pt \
  --yaml_path ./描述俑息.yaml \
  --output_dir ./results

# Evaluate IFCNet multimodal
python bimclip.py --mode eval --data_type MULTI_MODAL \
  --ifcnet_root /path/to/IFCNetCorePly/IFCNetCore \
  --model_path /path/to/IFCNet/best_ifcnet.mdl \
  --embeddings_path ifcnet_embeddings.pt \
  --yaml_path ./描述俑息.yaml \
  --output_dir ./results

Embeddings

Text embedding files are included in the GitHub repository:

File Classes Dataset
embeddings.pt 57 BIMCompNet (all categories)
ifcnet_embeddings.pt 20 IFCNet
model_net_10_embeddings.pt 10 ModelNet-10
model_net_40_embeddings.pt 40 ModelNet-40

Use the matching embeddings file for each dataset. Do not mix across datasets.

Dataset access

BIMCompNet is hosted by the 606 Lab at Xi'an University of Technology. Visit the link below to apply for access or download:

πŸ‘‰ https://bimcompnet-606lab.xaut.edu.cn/

ModelNet multimodal extension

ModelNet10.zip and ModelNet40.zip contain the original ModelNet meshes extended with point clouds and multi-view images, constructed using our multimodal data pipeline. The directory layout inside each zip is:

ModelNet{10|40}/
└── {class}/
    β”œβ”€β”€ train/
    β”‚   β”œβ”€β”€ obj/          # Original mesh (.obj)
    β”‚   β”œβ”€β”€ ply/          # Point cloud sampled from mesh (1024 pts, .ply)
    β”‚   └── png/
    β”‚       └── {sample}/
    β”‚           └── Edges/   # 12 edge-rendered views (0.png – 11.png)
    └── test/
        └── (same structure)

Point clouds are uniformly sampled from the mesh surface (1024 points per object). Multi-view images are edge-rendered from 12 fixed viewpoints following the camera placement strategy used in BIMCompNet.

Third-Party Weights (ULIP-2 baseline)

The official ULIP-2 pretrained PointBERT weights (866 MB) are included in this repository at:

ULIP2/ULIP-2-PointBERT-8k-xyz-pc-slip_vit_b-objaverse-pretrained.pt

Alternatively, download directly from the original source:

https://huggingface.co/datasets/SFXX/ulip/resolve/main/ULIP-2/pretrained_models/ULIP-2-PointBERT-8k-xyz-pc-slip_vit_b-objaverse-pretrained.pt

Architecture Overview

BIM-CLIP uses three modality encoders (ViT for images, PointNet for point clouds, MeshNet for meshes), projects each into a shared 1536-dim language embedding space via contrastive alignment against text-embedding-ada-002 anchors, then fuses them through a Language-Guided Cross-Modal Attention (CMA) module. During fine-tuning, only the CMA module (10.62M parameters) is updated.


Citation

@article{meng2026bimclip,
  title={BIM-CLIP: Language-Guided Multimodal Representation Learning for BIM Component Recognition},
  author={Meng, Haining and Dong, Haoyang and Yang, Mingsong and Fan, Xing and Hei, Xinhong},
  journal={[to-be-updated upon acceptance]},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support