XXXCARREY commited on
Commit
3bc63c0
·
verified ·
1 Parent(s): e2e3fbb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +200 -3
README.md CHANGED
@@ -1,3 +1,200 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ Repository for EmbodiedSAM: Online Segment Any 3D Thing in Real Time, an efficient framework that leverages vision foundation models for <b>online</b>, <b>real-time</b>, <b>fine-grained</b>, <b>generalized</b> and <b>open-vocabulary</b> 3D instance segmentation.
5
+
6
+ The official code is publicly release in this [repo](https://github.com/xuxw98/ESAM).
7
+
8
+
9
+ ## Citation
10
+ ```
11
+ @article{xu2024esam,
12
+ title={EmbodiedSAM: Online Segment Any 3D Thing in Real Time},
13
+ author={Xiuwei Xu and Huangxing Chen and Linqing Zhao and Ziwei Wang and Jie Zhou and Jiwen Lu},
14
+ journal={arXiv preprint arXiv:2408.11811},
15
+ year={2024}
16
+ }
17
+ ```
18
+
19
+ ## Main Results
20
+
21
+ We provide the checkpoints for quick reproduction of the results reported in the paper.
22
+
23
+ **Class-agnostic 3D instance segmentation results on ScanNet200 dataset:**
24
+ | Method | Type | VFM | AP | AP@50 | AP@25 | Speed(ms) | Downloads |
25
+ | :-----------------------------------------------------: | :-----: | :---------------------------------------------------------: | :------: | :------: | :------: | :-----------: | :----------------------------------------------------------: |
26
+ | [SAMPro3D](https://github.com/GAP-LAB-CUHK-SZ/SAMPro3D) | Offline | [SAM](https://github.com/facebookresearch/segment-anything) | 18.0 | 32.8 | 56.1 | -- | -- |
27
+ | [SAI3D](https://github.com/yd-yin/SAI3D) | Offline | [SemanticSAM](https://github.com/UX-Decoder/Semantic-SAM) | 30.8 | 50.5 | 70.6 | -- | -- |
28
+ | [SAM3D](https://github.com/Pointcept/SegmentAnything3D) | Online | SAM | 20.6 | 35.7 | 55.5 | 1369+1518 | -- |
29
+ | ESAM | Online | SAM | 42.2 | 63.7 | 79.6 | 1369+**80** | [model](https://huggingface.co/XXXCARREY/EmbodiedSAM/blob/main/ESAM_CA_online_epoch_128.pth) |
30
+ | ESAM-E | Online | [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) | **43.4** | **65.4** | **80.9** | **20**+**80** | [model](https://huggingface.co/XXXCARREY/EmbodiedSAM/blob/main/ESAM-E_CA_online_epoch_128.pth) |
31
+
32
+ **Dataset transfer results from ScanNet200 to SceneNN and 3RScan:**
33
+ <table class="tg"><thead>
34
+ <tr>
35
+ <th class="tg-b2st" rowspan="2">Method</th>
36
+ <th class="tg-b2st" rowspan="2">Type </th>
37
+ <th class="tg-b2st" colspan="3">ScanNet200--&gt;SceneNN</th>
38
+ <th class="tg-b2st" colspan="3">ScanNet200--&gt;3RScan</th>
39
+ </tr>
40
+ <tr>
41
+ <th class="tg-wa1i">AP</th>
42
+ <th class="tg-wa1i">AP@50</th>
43
+ <th class="tg-wa1i">AP@25</th>
44
+ <th class="tg-wa1i">AP</th>
45
+ <th class="tg-wa1i">AP@50</th>
46
+ <th class="tg-wa1i">AP@25</th>
47
+ </tr></thead>
48
+ <tbody>
49
+ <tr>
50
+ <td class="tg-nrix">SAMPro3D</td>
51
+ <td class="tg-nrix">Offline</td>
52
+ <td class="tg-nrix">12.6</td>
53
+ <td class="tg-nrix">25.8</td>
54
+ <td class="tg-nrix">53.2</td>
55
+ <td class="tg-nrix">3.9</td>
56
+ <td class="tg-nrix">8.0</td>
57
+ <td class="tg-nrix">21.0</td>
58
+ </tr>
59
+ <tr>
60
+ <td class="tg-nrix">SAI3D</td>
61
+ <td class="tg-nrix">Offline</td>
62
+ <td class="tg-nrix">18.6</td>
63
+ <td class="tg-nrix">34.7</td>
64
+ <td class="tg-nrix">65.7</td>
65
+ <td class="tg-nrix">5.4</td>
66
+ <td class="tg-nrix">11.8</td>
67
+ <td class="tg-nrix">27.4</td>
68
+ </tr>
69
+ <tr>
70
+ <td class="tg-nrix">SAM3D</td>
71
+ <td class="tg-nrix">Online</td>
72
+ <td class="tg-nrix">15.1</td>
73
+ <td class="tg-nrix">30.0</td>
74
+ <td class="tg-nrix">51.8</td>
75
+ <td class="tg-nrix">6.2</td>
76
+ <td class="tg-nrix">13.0</td>
77
+ <td class="tg-nrix">33.9</td>
78
+ </tr>
79
+ <tr>
80
+ <td class="tg-nrix">ESAM</td>
81
+ <td class="tg-nrix">Online</td>
82
+ <td class="tg-nrix"><b>28.8</b></td>
83
+ <td class="tg-nrix"><b>52.2</b></td>
84
+ <td class="tg-nrix">69.3</td>
85
+ <td class="tg-nrix"><b>14.1</b></td>
86
+ <td class="tg-nrix"><b>31.2</b></td>
87
+ <td class="tg-nrix"><b>59.6</b></td>
88
+ </tr>
89
+ <tr>
90
+ <td class="tg-nrix">ESAM-E</td>
91
+ <td class="tg-nrix">Online</td>
92
+ <td class="tg-nrix">28.6</td>
93
+ <td class="tg-nrix">50.4</td>
94
+ <td class="tg-nrix"><b>71.0</b></td>
95
+ <td class="tg-nrix">13.9</td>
96
+ <td class="tg-nrix">29.4</td>
97
+ <td class="tg-nrix">58.8</td>
98
+ </tr>
99
+ </tbody></table>
100
+
101
+
102
+ **3D instance segmentation results on ScanNet dataset:**
103
+ <table class="tg"><thead>
104
+ <tr>
105
+ <th class="tg-gabo" rowspan="2">Method</th>
106
+ <th class="tg-gabo" rowspan="2">Type</th>
107
+ <th class="tg-gabo" colspan="3">ScanNet</th>
108
+ <th class="tg-gabo" colspan="3">SceneNN</th>
109
+ <th class="tg-gabo" rowspan="2">FPS</th>
110
+ <th class="tg-gabo" rowspan="2">Download</th>
111
+ </tr>
112
+ <tr>
113
+ <th class="tg-uzvj">AP</th>
114
+ <th class="tg-uzvj">AP@50</th>
115
+ <th class="tg-uzvj">AP@25</th>
116
+ <th class="tg-uzvj">AP</th>
117
+ <th class="tg-uzvj">AP@50</th>
118
+ <th class="tg-uzvj">AP@25</th>
119
+ </tr></thead>
120
+ <tbody>
121
+ <tr>
122
+ <td class="tg-9wq8"><a href=https://github.com/SamsungLabs/td3d>TD3D</a></td>
123
+ <td class="tg-9wq8">offline</td>
124
+ <td class="tg-9wq8">46.2</td>
125
+ <td class="tg-9wq8">71.1</td>
126
+ <td class="tg-9wq8">81.3</td>
127
+ <td class="tg-9wq8">--</td>
128
+ <td class="tg-9wq8">--</td>
129
+ <td class="tg-9wq8">--</td>
130
+ <td class="tg-9wq8">--</td>
131
+ <td class="tg-9wq8">--</td>
132
+ </tr>
133
+ <tr>
134
+ <td class="tg-9wq8"><a href=https://github.com/oneformer3d/oneformer3d>Oneformer3D</a></td>
135
+ <td class="tg-9wq8">offline</td>
136
+ <td class="tg-9wq8">59.3</td>
137
+ <td class="tg-9wq8">78.8</td>
138
+ <td class="tg-9wq8">86.7</td>
139
+ <td class="tg-9wq8">--</td>
140
+ <td class="tg-9wq8">--</td>
141
+ <td class="tg-9wq8">--</td>
142
+ <td class="tg-9wq8">--</td>
143
+ <td class="tg-9wq8">--</td>
144
+ </tr>
145
+ <tr>
146
+ <td class="tg-9wq8"><a href=https://github.com/THU-luvision/INS-Conv>INS-Conv</a></td>
147
+ <td class="tg-9wq8">online</td>
148
+ <td class="tg-9wq8">--</td>
149
+ <td class="tg-9wq8">57.4</td>
150
+ <td class="tg-9wq8">--</td>
151
+ <td class="tg-9wq8">--</td>
152
+ <td class="tg-9wq8">--</td>
153
+ <td class="tg-9wq8">--</td>
154
+ <td class="tg-9wq8">--</td>
155
+ <td class="tg-9wq8">--</td>
156
+ </tr>
157
+ <tr>
158
+ <td class="tg-9wq8"><a href=https://github.com/xuxw98/Online3D>TD3D-MA</a></td>
159
+ <td class="tg-9wq8">online</td>
160
+ <td class="tg-9wq8">39.0</td>
161
+ <td class="tg-9wq8">60.5</td>
162
+ <td class="tg-9wq8">71.3</td>
163
+ <td class="tg-9wq8">26.0</td>
164
+ <td class="tg-9wq8">42.8</td>
165
+ <td class="tg-9wq8">59.2</td>
166
+ <td class="tg-9wq8">3.5</td>
167
+ <td class="tg-9wq8">--</td>
168
+ </tr>
169
+ <tr>
170
+ <td class="tg-9wq8">ESAM-E</td>
171
+ <td class="tg-9wq8">online</td>
172
+ <td class="tg-9wq8">41.6</td>
173
+ <td class="tg-9wq8">60.1</td>
174
+ <td class="tg-9wq8">75.6</td>
175
+ <td class="tg-9wq8">27.5</td>
176
+ <td class="tg-9wq8">48.7</td>
177
+ <td class="tg-uzvj"><b>64.6</b></td>
178
+ <td class="tg-uzvj"><b>10</b></td>
179
+ <td class="tg-9wq8"><a href=https://huggingface.co/XXXCARREY/EmbodiedSAM/blob/main/ESAM-E_online_epoch_128.pth=1>model</a></td>
180
+ </tr>
181
+ <tr>
182
+ <td class="tg-nrix">ESAM-E+FF</td>
183
+ <td class="tg-nrix">online</td>
184
+ <td class="tg-wa1i"><b>42.6</b></td>
185
+ <td class="tg-wa1i"><b>61.9</b></td>
186
+ <td class="tg-wa1i"><b>77.1</b></td>
187
+ <td class="tg-wa1i"><b>33.3</b></td>
188
+ <td class="tg-wa1i"><b>53.6</b></td>
189
+ <td class="tg-nrix">62.5</td>
190
+ <td class="tg-nrix">9.8</td>
191
+ <td class="tg-nrix"><a href=https://huggingface.co/XXXCARREY/EmbodiedSAM/blob/main/ESAM-E_FF_online_epoch_128.pth=1>model</a></td>
192
+ </tr>
193
+ </tbody></table>
194
+
195
+
196
+ **Open-Vocabulary 3D instance segmentation results on ScanNet200 dataset:**
197
+ | Method | AP | AP@50 | AP@25 |
198
+ | :----: | :------: | :------: | :------: |
199
+ | SAI3D | 9.6 | 14.7 | 19.0 |
200
+ | ESAM | **13.7** | **19.2** | **23.9** |