Improve model card: Add metadata, links, abstract, and setup for GS-Reasoner

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +87 -3
README.md CHANGED
@@ -1,3 +1,87 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
+ tags:
6
+ - 3d-vision
7
+ - visual-grounding
8
+ - spatial-reasoning
9
+ ---
10
+
11
+ # Reasoning in Space via Grounding in the World (GS-Reasoner)
12
+
13
+ We present **Grounded-Spatial Reasoner (GS-Reasoner)**, the first 3D-LLM that bridges 3D visual grounding and spatial reasoning.
14
+
15
+ This model was introduced in the paper: [Reasoning in Space via Grounding in the World](https://huggingface.co/papers/2510.13800)
16
+ Project Page: [https://yiming-cc.github.io/gs-reasoner/](https://yiming-cc.github.io/gs-reasoner/)
17
+ Code: [https://github.com/WU-CVGL/GS-Reasoner](https://github.com/WU-CVGL/GS-Reasoner)
18
+
19
+ <div style="text-align: center;">
20
+ <img src="https://github.com/WU-CVGL/GS-Reasoner/raw/main/assets/teaser.png" width=100% >
21
+ </div>
22
+
23
+ ## Abstract
24
+
25
+ In this paper, we claim that 3D visual grounding is the cornerstone of spatial reasoning and introduce the Grounded-Spatial Reasoner (GS-Reasoner) to explore the effective spatial representations that bridge the gap between them. Existing 3D LLMs suffer from the absence of a unified 3D representation capable of jointly capturing semantic and geometric information. This deficiency is manifested either in poor performance on grounding or in an excessive reliance on external modules, ultimately hindering the seamless integration of grounding and spatial reasoning. To address this, we propose a simple yet effective dual-path pooling mechanism that tightly aligns geometric features with both semantic and positional cues, constructing a unified image patch-based 3D representation that encapsulates all essential information without increasing the number of input tokens. Leveraging this holistic representation, GS-Reasoner is the first 3D LLM that achieves autoregressive grounding entirely without external modules while delivering performance comparable to state-of-the-art models, establishing a unified and self-contained framework for 3D spatial reasoning. To further bridge grounding and spatial reasoning, we introduce the Grounded Chain-of-Thought (GCoT) dataset. This dataset is meticulously curated to include both 3D bounding box annotations for objects referenced in reasoning questions and step-by-step reasoning paths that integrate grounding as a core component of the problem-solving process. Extensive experiments demonstrate that GS-Reasoner achieves impressive results on 3D visual grounding, which in turn significantly enhances its spatial reasoning capabilities, leading to state-of-the-art performance.
26
+
27
+ ## Installation and Setup
28
+
29
+ For detailed installation instructions and data preprocessing, please refer to the [official GitHub repository](https://github.com/WU-CVGL/GS-Reasoner).
30
+
31
+ To set up the environment, follow these steps:
32
+
33
+ ```bash
34
+ conda create -n gs-reasoner python=3.11 -y
35
+ conda activate gs-reasoner
36
+
37
+ git clone git@github.com:WU-CVGL/GS-Reasoner.git
38
+ cd GS-Reasoner
39
+
40
+ # install package for GS-Reasoner
41
+ pip install -e .
42
+
43
+ # (optional) opencv-python extral dependency
44
+ sudo apt update
45
+ sudo apt install -y libgl1 libglib2.0-0 libsm6 libxext6 libxrender1
46
+
47
+ # (optional) install gcc
48
+ conda install -c conda-forge gcc=13.2 gxx=13.2 -y
49
+
50
+ # (optional) install cuda toolkit 12.4
51
+ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
52
+ sudo dpkg -i cuda-keyring_1.1-1_all.deb
53
+ sudo apt-get update
54
+ sudo apt-get -y install cuda-toolkit-12-4
55
+
56
+ # install package for Sonata
57
+ cd llava/submodules/sonata
58
+ pip install -r requirements.txt
59
+ cd ../../..
60
+
61
+ # install package for VSI-Bench Evaluation
62
+ cd llava/submodules/lmms_eval
63
+ pip install -r requirements.txt
64
+ cd ../../..
65
+ ```
66
+
67
+ ## Model Weights
68
+
69
+ We provide two pretrained model checkpoints:
70
+
71
+ * **[GS-Reasoner](https://huggingface.co/ymccccc/GS-Reasoner)** – the main model used in our paper, producing more deterministic chain-of-thought reasoning.
72
+ * **[GS-Reasoner-Diverse](https://huggingface.co/ymccccc/GS-Reasoner-Diverse)** – a variant that generates more diverse chain-of-thought outputs with only a minor performance drop (less than 1.0 on VSI-Bench).
73
+
74
+ To use them, download the checkpoints and place them under the `ckpt/` directory. For more advanced usage and inference examples, please refer to the [official GitHub repository](https://github.com/WU-CVGL/GS-Reasoner).
75
+
76
+ ## Citation
77
+
78
+ If you find our work helpful or inspiring, please feel free to cite it.
79
+
80
+ ```bibtex
81
+ @article{chen2024reasoning,
82
+ title={Reasoning in Space via Grounding in the World},
83
+ author={Chen, Yiming and Qi, Zekun and Zhang, Wenyao and Jin, Xin and Zhang, Li and Liu, Peidong},
84
+ journal={arXiv preprint arXiv:2510.13800},
85
+ year={2024}
86
+ }
87
+ ```