Spaces:
Running
on
Zero
Running
on
Zero
| title: Spatial-SSRL Spatial Reasoning | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: Spatial reasoning with vision-language models | |
| # π Spatial-SSRL: Spatial Reasoning with Vision-Language Models | |
| This demo showcases the spatial reasoning capabilities of vision-language models trained to understand 3D spatial relationships from 2D images. | |
| ## Features | |
| - **3D Location Understanding**: Determine which objects are closer or further from the camera | |
| - **Orientation Analysis**: Understand which direction objects are facing | |
| - **Relative Positioning**: Answer questions about object positions relative to each other | |
| - **Step-by-step Reasoning**: The model provides detailed reasoning before answering | |
| ## How to Use | |
| 1. Upload an image | |
| 2. Ask a question about spatial relationships in the image | |
| 3. The model will provide a detailed answer with reasoning | |
| ## Example Questions | |
| - "Which object is further away from the camera? A. boat B. fire hydrant" | |
| - "Are the kid and the teddy bear facing same or similar directions?" | |
| - "If I stand at the recreational vehicle's position facing where it is facing, is the dog in front of me or behind me?" | |
| The model is trained to provide answers in a structured format with reasoning enclosed in `<think>` tags and final answers in `\boxed{}`. | |