|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
 |
|
|
# GeoCAD-LLM: CAD Sequence Generation via Multimodal LLMs with Equivariant Geometric Features🛠️ |
|
|
- [Github](https://github.com/kshsh0405/GeoCADLLM_Inference) |
|
|
- [Paper](comming_soon) |
|
|
|
|
|
## GeoCAD-LLM_4B🛠️ |
|
|
- Base_model: [Qwen3-4B-Instruct](Qwen/Qwen3-4B-Instruct-2507) |
|
|
- Max sequence length: 8,192 |
|
|
- Epoch: 2 |
|
|
- Learning rate: 1e-4 |
|
|
- Batch size: 128 |
|
|
- This model specialized for text-to-CAD. However, it also supports multi-modality. |
|
|
|
|
|
## GeoCAD-LLM Contributions🔥 |
|
|
 |
|
|
|
|
|
- **State-of-the-art Performace🏆** in [Text2CAD](https://huggingface.co/datasets/SadilKhan/Text2CAD) datasets. (as shown in below Tables) |
|
|
- **Multimodal CAD Generation🌐**: Both text-to-CAD and pc-text-to-CAD. |
|
|
- GeoCAD-LLM directly generate CAD vector sequence as **natural language**. |
|
|
- **Novel Two Stage Training Pipeline🧭**: In stage1, training semantic geometry alignment. In stage2, training fine-grained geometry. Especially, we **direct levearge E(3)-equivariant features** for geomtry-consistent supervision, inherently ensuring geometric feature consistency regardless of input orientation. |
|
|
- **Apply Point Cloud Dropout (PCD) technique🧶**: PCD mitigates over-reliance on geometric inputs and improves multimodal generalization. Also, it is a critical training technique for multimodal CAD generation. |
|
|
|
|
|
## Performace (text-to-CAD & pc-text-to-CAD)🔥 |
|
|
 |
|
|
|
|
|
## Qualitative Results |
|
|
Please check our paper and supplementary materials.🤗 |
|
|
|
|
|
## Bibtex🤗 |
|
|
``` |
|
|
(TODO) |
|
|
``` |