File size: 2,310 Bytes
4d5f10f
71f70d4
4d5f10f
 
 
 
 
 
 
 
71f70d4
4d5f10f
 
 
71f70d4
4d5f10f
71f70d4
4d5f10f
71f70d4
 
4d5f10f
 
 
 
 
 
 
71f70d4
 
 
d5dd91a
 
 
71f70d4
 
4d5f10f
 
 
71f70d4
 
 
4d5f10f
 
 
71f70d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4d5f10f
 
 
 
 
 
 
 
71f70d4
 
 
 
 
 
 
 
 
4d5f10f
 
 
 
 
 
24e7e43
4d5f10f
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
license: other
language:
- en
tags:
- 3d
- point-cloud
- multimodal
- multi-object
- pointllm
- modelnet40
pipeline_tag: text-generation
---

# Multi-3DLLM Checkpoints

This repository hosts the released BeyondSingleObject checkpoints:

- `multi-3dllm/`: MO3D, Shape Mating, and Change Captioning
- `multi-3dllm-classification/`: ModelNet40 zero-shot classification

Use the code and scripts from:

```text
https://github.com/KohsukeIde/BeyondSingleObject
```

## Download

```bash
huggingface-cli download idekoh/Multi-3DLLM \
  --local-dir checkpoints \
  --include "multi-3dllm/**" "multi-3dllm-classification/**"
```

Expected local layout:

```text
checkpoints/
├── multi-3dllm/
└── multi-3dllm-classification/
data/
```

## Usage

Example inference and LLM-based evaluation:

```bash
MODEL_PATH=checkpoints/multi-3dllm \
OUTPUT_DIR=outputs/infer \
scripts/eval/infer.sh
```

ModelNet40 classification:

```bash
MODEL_PATH=checkpoints/multi-3dllm-classification \
OUTPUT_DIR=outputs/modelnet40_eval \
LIMIT=0 \
PROMPT_MODE=paper \
NUM_OBJECTS=1 \
TARGET_POSITION=1 \
scripts/eval/eval_modelnet.sh
```

Repeat `(NUM_OBJECTS, TARGET_POSITION) = (1,1), (2,1), (2,2), (3,1), (3,2),
(3,3)` for the full table.

## Notes

The LLM-judged metrics for reasoning and delta-caption quality depend on the
judge model and prompt configuration. Use the released evaluation scripts for
reproducible comparisons, and report the exact judge configuration together
with the checkpoint.

## License

These checkpoints are built with the BeyondSingleObject codebase and use
PointLLM-style initialization and data. They may inherit terms from upstream
model, code, and dataset components, including PointLLM, Vicuna/Llama,
Objaverse/Cap3D, ShapeTalk, Thingi10K, Neural Shape Mating, and ModelNet40.
Please check the corresponding upstream licenses before redistribution or
commercial use.

## Citation

```bibtex
@inproceedings{ide2026beyondsingleobject,
  title={BeyondSingleObject: Learning 3D Relations with Large Language Models},
  author={Ide, Kohsuke and Yamada, Ryousuke and Qiu, Yue and Ma, Xianzheng and Fukuhara, Yoshihiro and Kataoka, Hirokatsu and Satoh, Yutaka},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
  year={2026}
}
```