Spaces:

yuhangzang
/

Spatial-SSRL

Running on Zero

Spatial-SSRL / README.md

yuhangzang

Add Gradio Space for Spatial-SSRL spatial reasoning demo

1e5cd04 about 1 month ago

1.39 kB

	---
	title: Spatial-SSRL Spatial Reasoning
	emoji: 🌍
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: Spatial reasoning with vision-language models
	---

	# 🌍 Spatial-SSRL: Spatial Reasoning with Vision-Language Models

	This demo showcases the spatial reasoning capabilities of vision-language models trained to understand 3D spatial relationships from 2D images.

	## Features

	- 3D Location Understanding: Determine which objects are closer or further from the camera
	- Orientation Analysis: Understand which direction objects are facing
	- Relative Positioning: Answer questions about object positions relative to each other
	- Step-by-step Reasoning: The model provides detailed reasoning before answering

	## How to Use

	1. Upload an image
	2. Ask a question about spatial relationships in the image
	3. The model will provide a detailed answer with reasoning

	## Example Questions

	- "Which object is further away from the camera? A. boat B. fire hydrant"
	- "Are the kid and the teddy bear facing same or similar directions?"
	- "If I stand at the recreational vehicle's position facing where it is facing, is the dog in front of me or behind me?"

	The model is trained to provide answers in a structured format with reasoning enclosed in `<think>` tags and final answers in `\boxed{}`.