File size: 2,310 Bytes
4d5f10f 71f70d4 4d5f10f 71f70d4 4d5f10f 71f70d4 4d5f10f 71f70d4 4d5f10f 71f70d4 4d5f10f 71f70d4 d5dd91a 71f70d4 4d5f10f 71f70d4 4d5f10f 71f70d4 4d5f10f 71f70d4 4d5f10f 24e7e43 4d5f10f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | ---
license: other
language:
- en
tags:
- 3d
- point-cloud
- multimodal
- multi-object
- pointllm
- modelnet40
pipeline_tag: text-generation
---
# Multi-3DLLM Checkpoints
This repository hosts the released BeyondSingleObject checkpoints:
- `multi-3dllm/`: MO3D, Shape Mating, and Change Captioning
- `multi-3dllm-classification/`: ModelNet40 zero-shot classification
Use the code and scripts from:
```text
https://github.com/KohsukeIde/BeyondSingleObject
```
## Download
```bash
huggingface-cli download idekoh/Multi-3DLLM \
--local-dir checkpoints \
--include "multi-3dllm/**" "multi-3dllm-classification/**"
```
Expected local layout:
```text
checkpoints/
├── multi-3dllm/
└── multi-3dllm-classification/
data/
```
## Usage
Example inference and LLM-based evaluation:
```bash
MODEL_PATH=checkpoints/multi-3dllm \
OUTPUT_DIR=outputs/infer \
scripts/eval/infer.sh
```
ModelNet40 classification:
```bash
MODEL_PATH=checkpoints/multi-3dllm-classification \
OUTPUT_DIR=outputs/modelnet40_eval \
LIMIT=0 \
PROMPT_MODE=paper \
NUM_OBJECTS=1 \
TARGET_POSITION=1 \
scripts/eval/eval_modelnet.sh
```
Repeat `(NUM_OBJECTS, TARGET_POSITION) = (1,1), (2,1), (2,2), (3,1), (3,2),
(3,3)` for the full table.
## Notes
The LLM-judged metrics for reasoning and delta-caption quality depend on the
judge model and prompt configuration. Use the released evaluation scripts for
reproducible comparisons, and report the exact judge configuration together
with the checkpoint.
## License
These checkpoints are built with the BeyondSingleObject codebase and use
PointLLM-style initialization and data. They may inherit terms from upstream
model, code, and dataset components, including PointLLM, Vicuna/Llama,
Objaverse/Cap3D, ShapeTalk, Thingi10K, Neural Shape Mating, and ModelNet40.
Please check the corresponding upstream licenses before redistribution or
commercial use.
## Citation
```bibtex
@inproceedings{ide2026beyondsingleobject,
title={BeyondSingleObject: Learning 3D Relations with Large Language Models},
author={Ide, Kohsuke and Yamada, Ryousuke and Qiu, Yue and Ma, Xianzheng and Fukuhara, Yoshihiro and Kataoka, Hirokatsu and Satoh, Yutaka},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
year={2026}
}
```
|