yl-1993 nielsr HF Staff commited on
Commit
3182921
·
verified ·
1 Parent(s): 74abf50

Enhance model card: Add pipeline tag, library name, and usage examples (#2)

Browse files

- Enhance model card: Add pipeline tag, library name, and usage examples (31309f792ebfa03fc0700470f6ad351ef5e589d5)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +73 -1
README.md CHANGED
@@ -1,7 +1,9 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - OpenGVLab/InternVL3-8B
 
 
 
5
  ---
6
 
7
  **EN** | [中文](README_CN.md)
@@ -127,3 +129,73 @@ which achieve state-of-the-art performance among open-source models of comparabl
127
  ## What's Next?
128
  We will release the accompanying technical report shortly. Please stay tuned!
129
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  base_model:
3
  - OpenGVLab/InternVL3-8B
4
+ license: apache-2.0
5
+ pipeline_tag: image-text-to-text
6
+ library_name: transformers
7
  ---
8
 
9
  **EN** | [中文](README_CN.md)
 
129
  ## What's Next?
130
  We will release the accompanying technical report shortly. Please stay tuned!
131
 
132
+ ## 🛠️ QuickStart
133
+
134
+ ### Installation
135
+
136
+ We recommend using [uv](https://docs.astral.sh/uv/) to manage the environment.
137
+
138
+ > uv installation guide: <https://docs.astral.sh/uv/getting-started/installation/#installing-uv>
139
+
140
+ ```bash
141
+ git clone git@github.com:OpenSenseNova/SenseNova-SI.git
142
+ cd SenseNova-SI/
143
+ uv sync --extra cu124 # or one of [cu118|cu121|cu124|cu126|cu128|cu129], depending on your CUDA version
144
+ uv sync
145
+ source .venv/bin/activate
146
+ ```
147
+
148
+ ### How to Use
149
+
150
+ Here's an example demonstrating how to use the SenseNova-SI model for multi-image visual question answering with the `transformers` library.
151
+
152
+ ```python
153
+ import torch
154
+ from PIL import Image
155
+ from transformers import AutoModel, AutoProcessor
156
+
157
+ model_path = "sensenova/SenseNova-SI-1.1-InternVL3-8B"
158
+
159
+ # Load processor and model
160
+ processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
161
+ model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True).eval()
162
+
163
+ # Example: Pos-Obj-Obj subset of MMSI-Bench (from GitHub examples)
164
+ # These images should be available locally or loaded from a source
165
+ # For demonstration, assuming they are accessible via relative paths:
166
+ # (Note: In a real scenario, ensure 'examples/Q1_1.png' and 'examples/Q1_2.png' exist from the GitHub repo)
167
+ try:
168
+ image1 = Image.open("./examples/Q1_1.png").convert("RGB")
169
+ image2 = Image.open("./examples/Q1_2.png").convert("RGB")
170
+ except FileNotFoundError:
171
+ print("Example images not found. Please ensure 'examples/Q1_1.png' and 'examples/Q1_2.png' are available, or provide your own images.")
172
+ # Fallback for demonstration if images are not present
173
+ image1 = Image.new('RGB', (500, 500), color = 'red')
174
+ image2 = Image.new('RGB', (500, 500), color = 'blue')
175
+
176
+ question = "<image><image>
177
+ You are standing in front of the dice pattern and observing it. Where is the desk lamp approximately located relative to you?
178
+ Options: A: 90 degrees counterclockwise, B: 90 degrees clockwise, C: 135 degrees counterclockwise, D: 135 degrees clockwise"
179
+
180
+ # Prepare inputs
181
+ inputs = processor(text=question, images=[image1, image2], return_tensors="pt").to(model.device)
182
+
183
+ # Generate response
184
+ with torch.no_grad():
185
+ output = model.generate(**inputs, max_new_tokens=100)
186
+ response = processor.batch_decode(output, skip_special_tokens=True)[0]
187
+
188
+ print(f"Question: {question}")
189
+ print(f"Answer: {response}")
190
+ ```
191
+
192
+ ## 🖊️ Citation
193
+
194
+ ```bib
195
+ @article{sensenova-si,
196
+ title = {Scaling Spatial Intelligence with Multimodal Foundation Models},
197
+ author = {Cai, Zhongang and Wang, Ruisi and Gu, Chenyang and Pu, Fanyi and Xu, Junxiang and Wang, Yubo and Yin, Wanqi and Yang, Zhitao and Wei, Chen and Sun, Qingping and Zhou, Tongxi and Li, Jiaqi and Pang, Hui En and Qian, Oscar and Wei, Yukun and Lin, Zhiqian and Shi, Xuanke and Deng, Kewang and Han, Xiaoyang and Chen, Zukai and Fan, Xiangyu and Deng, Hanming and Lu, Lewei and Pan, Liang and Li, Bo and Liu, Ziwei and Wang, Quan and Lin, Dahua and Yang, Lei},
198
+ journal = {arXiv preprint arXiv:2511.13719},
199
+ year = {2025}
200
+ }
201
+ ```