hunarbatra commited on
Commit
2b01c92
·
verified ·
1 Parent(s): 1b2a3d8

Update README with model/dataset documentation

Browse files
Files changed (1) hide show
  1. README.md +0 -47
README.md CHANGED
@@ -53,40 +53,6 @@ You FIRST observe the image in <observe> </observe> tags, then visualise the rel
53
  Image size: {Width} x {Height}
54
  ```
55
 
56
- ## Output Format
57
-
58
- The model generates structured output with four components:
59
-
60
- 1. **`<observe>`**: Scene description covering relevant objects
61
- 2. **`<scene>`**: JSON scene graph with objects (id, bbox) and relationships (subject, predicate, object)
62
- 3. **`<think>`**: Step-by-step reasoning as internal monologue
63
- 4. **`<answer>`**: Final answer with option letter and text
64
-
65
- ### Example Output
66
-
67
- ```
68
- <observe>
69
- The image shows a living room with a couch, a coffee table, and a cat sitting on the floor.
70
- </observe>
71
- <scene>
72
- {
73
- "objects": [
74
- {"id": "couch.1", "bbox": [50, 100, 400, 350]},
75
- {"id": "cat.1", "bbox": [200, 300, 280, 400]},
76
- {"id": "table.1", "bbox": [150, 250, 350, 320]}
77
- ],
78
- "relationships": [
79
- {"subject": "cat.1", "predicate": "in front of", "object": "couch.1"},
80
- {"subject": "cat.1", "predicate": "beside", "object": "table.1"}
81
- ]
82
- }
83
- </scene>
84
- <think>
85
- Looking at the scene graph, the cat is positioned in front of the couch and beside the coffee table. The bounding box coordinates show the cat is at y=300-400 while the couch extends to y=350, confirming the cat is on the floor in front of the couch.
86
- </think>
87
- <answer> (B) in front of the couch </answer>
88
- ```
89
-
90
  ## Usage
91
 
92
  ```python
@@ -129,19 +95,6 @@ output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
129
  print(output)
130
  ```
131
 
132
- ## Evaluation Results
133
-
134
- SpatialThinker-7B achieves state-of-the-art performance on spatial reasoning benchmarks:
135
-
136
- | Benchmark | SpatialThinker-7B |
137
- |-----------|------------------------|
138
- | CV-Bench (3D) | Strong performance |
139
- | BLINK-Spatial | Outperforms GPT-4o |
140
- | SpatialBench | SOTA results |
141
- | RealWorldQA | Competitive |
142
-
143
- See the [paper](https://arxiv.org/abs/2511.07403) for detailed results.
144
-
145
  ## Citation
146
 
147
  ```bibtex
 
53
  Image size: {Width} x {Height}
54
  ```
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ## Usage
57
 
58
  ```python
 
95
  print(output)
96
  ```
97
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
  ## Citation
99
 
100
  ```bibtex