File size: 5,256 Bytes
3c6bc44
ba92f5f
 
 
 
 
 
 
 
 
 
 
 
 
 
3c6bc44
 
ba92f5f
3c6bc44
ba92f5f
3c6bc44
ba92f5f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
license: apache-2.0
library_name: pytorch
pipeline_tag: mask-generation
tags:
  - 3d
  - mesh
  - 3d-part-segmentation
  - sam2
  - segmentation
  - point-cloud
  - geosam2
base_model: facebook/sam2.1-hiera-base-plus
language:
  - en
---

# GeoSAM2

> Unleashing the Power of SAM2 for 3D Part Segmentation  ·  CVPR 2026

<div align="center">

[![Project Page](https://img.shields.io/badge/Project-Page-blue.svg)](https://detailgen3d.github.io/GeoSAM2/)
[![Paper](https://img.shields.io/badge/arXiv-2508.14036-b31b1b.svg)](https://arxiv.org/abs/2508.14036)
[![Code](https://img.shields.io/badge/GitHub-Code-181717.svg)](https://github.com/VAST-AI-Research/GeoSAM2)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)

</div>

GeoSAM2 lifts [SAM2](https://github.com/facebookresearch/sam2) from images to
3D meshes. Given a multi-view rendering of a mesh and an interactive prompt
(a single 2D click or a 2D mask) on one view, it propagates a consistent
segmentation across all views and back-projects the result to per-face 3D
part labels.

This repository hosts the **pretrained inference checkpoint** (`geosam2.pt`).
Code, configs, and a small multi-view demo dataset live in the companion
GitHub repo: <https://github.com/VAST-AI-Research/GeoSAM2>.

## Model summary

| | |
|---|---|
| Task | Interactive 3D part segmentation on meshes via multi-view 2D propagation |
| Base model | [`facebook/sam2.1-hiera-base-plus`](https://huggingface.co/facebook/sam2.1-hiera-base-plus) |
| Architecture | SAM2 (Hiera-B+ image encoder + memory attention + mask decoder), plus a dedicated **position-map encoder** for 3D geometry, **feature fusion**, and **LoRA adapters** on the image and position-map encoders |
| Parameters | ~154 M (fp32: ~588 MB · bf16: ~294 MB) |
| Input modalities | 12 rendered views per mesh: color (`.webp`), depth (`.exr`), normal (`.webp`), camera metadata (`meta.json`) |
| Prompts | 2D point clicks or a 2D mask on any view |
| Output | Per-view 2D label maps and per-face 3D labels for the input mesh |
| Render config | 12 azimuthally-spaced views at 1024×1024 from a fixed elevation |

## Quickstart

```bash
# 1. Clone the code
git clone https://github.com/VAST-AI-Research/GeoSAM2.git
cd GeoSAM2
pip install -r requirements.txt
pip install -e .   # builds the optional CUDA op; set GEOSAM2_BUILD_CUDA=0 to skip

# 2. Download the checkpoint into ./ckpt
hf download VAST-AI/GeoSAM2 geosam2.pt --local-dir ckpt

# 3. Run the bundled demo (single-view point prompt -> 3D segmentation)
bash scripts/run_example.sh
```

Direct inference from a 2D mask:

```bash
python inference.py \
  --data-root example/sample_00 \
  --mask-path outputs/sample_00/2d_seg/mask_view0000.npy \
  --mask-view 0 \
  --postprocess-pa 0.02 \
  --output-dir outputs/sample_00/3d_seg
```

See the [README](https://github.com/VAST-AI-Research/GeoSAM2#readme) for the
full usage guide, including rendering your own meshes with Blender.

## Files

| File | Size | Description |
|---|---|---|
| `geosam2.pt` | ~588 MB | Pretrained weights in `float32` (`{"model": state_dict}`). Default choice. |
| `geosam2-bf16.pt` | ~294 MB | Same weights cast to `bfloat16` for faster downloads / lower memory. Loaded by the standard code path — `load_state_dict` upcasts to the model dtype, so no extra steps are required. Expect a small reconstruction error of ≤ 0.015 per weight versus the fp32 file. |

Both checkpoints are loaded by
`sam2.build_sam.build_sam2_video_predictor_geosam2` with the Hydra config
`sam2/configs/geosam2.yaml`. Pass the chosen file via `--sam2-checkpoint`
(or use the default `ckpt/geosam2.pt` path expected by the scripts).

## Intended use

- **Intended**: interactive 3D part segmentation of single-object meshes for
  research and content-creation tooling.
- **Out of scope**: scene-level segmentation, dynamic scenes, semantic
  category prediction (the model produces instance-level part masks, not
  semantic class labels), and safety-critical applications.

## Limitations

- Expects the 12-view rendering convention produced by `geosam2_render.py`;
  arbitrary view counts or camera trajectories may degrade quality.
- The mesh must fit within the normalised cube used at render time
  (`geosam2_render.py` handles this for the bundled samples).
- Performance on thin/wire-like geometry and on highly transparent surfaces
  is still an open problem.
- The post-processing `--postprocess-pa` value sometimes needs hand-tuning
  per mesh (`0.01`, `0.02`, `0.035` are useful starting points).

## License

Released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
The checkpoint is a derivative of Meta's
[SAM2](https://github.com/facebookresearch/sam2) (Apache 2.0); see the
[`NOTICE`](https://github.com/VAST-AI-Research/GeoSAM2/blob/main/NOTICE)
file in the code repo for attribution.

## Citation

```bibtex
@article{deng2025geosam2,
  title   = {GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation},
  author  = {Deng, Ken and Yang, Yunhan and Sun, Jingxiang and
             Liu, Xihui and Liu, Yebin and Liang, Ding and Cao, Yan-Pei},
  journal = {arXiv preprint arXiv:2508.14036},
  year    = {2025}
}
```