Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
OX-PIXL
/
SpatialThinker-3B
like
2
Follow
Perceptual Intelligence and Extended Reality Lab
6
Image-Text-to-Text
Safetensors
English
qwen2_5_vl
spatial-reasoning
multimodal
vision-language
scene-graph
reinforcement-learning
conversational
arxiv:
2511.07403
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
4
hunarbatra
commited on
Nov 12, 2025
Commit
aec53bb
·
verified
·
1 Parent(s):
3358e8b
Create README.md
Browse files
Files changed (1)
hide
show
README.md
+7
-0
README.md
ADDED
Viewed
@@ -0,0 +1,7 @@
1
+
---
2
+
datasets:
3
+
- OX-PIXL/STVQA-7K
4
+
base_model:
5
+
- Qwen/Qwen2.5-VL-3B-Instruct
6
+
---
7
+
Paper: https://arxiv.org/abs/2511.07403