hany01rye
/

TIGeR

computer-vision

spatial-reasoning

vision-language-model

Model card Files Files and versions

TIGeR / README.md

hany01rye's picture

Upload README.md with huggingface_hub

12a90e2 verified 4 months ago

|

history blame contribute delete

2.28 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- computer-vision
	- robotics
	- spatial-reasoning
	- vision-language-model
	- multi-modal
	- glm4v
	- fine-tuned
	base_model: glm4v
	model_type: vision-language-model
	datasets:
	- custom
	library_name: transformers
	---

	# TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics

	## Usage

	### Environment Requirements

	```bash
	pip install -r requirements.txt
	```

	### Configuration

	Before using the model, you need to update the configuration file `glm4v_tisr_full_inference.yaml`:

	1. Update `media_dir` to your image directory:
	```yaml
	media_dir: /path/to/your/images
	```

	2. Update the image path in `example_usage.py`:
	```python
	image_paths = ["/path/to/your/image.jpg"] # Replace with actual image path
	```

	### Basic Usage

	```python
	import sys
	from llamafactory.chat.chat_model import ChatModel

	# Load model using LLaMA-Factory ChatModel
	config_file = "glm4v_tisr_full_inference.yaml"

	# Simulate command line arguments
	original_argv = sys.argv.copy()
	sys.argv = [sys.argv[0], config_file]

	try:
	chat_model = ChatModel()
	finally:
	# Restore original command line arguments
	sys.argv = original_argv

	# Prepare input
	image_paths = ["/path/to/your/image.jpg"] # Replace with actual image path
	question = "Two points are circled on the image, labeled by A and B beside each circle. Which point is closer to the camera? Select from the following choices.\n(A) A is closer\n(B) B is closer"

	# Prepare messages
	messages = [
	{
	"role": "user",
	"content": question
	}
	]

	# Get model response
	response = chat_model.chat(messages, images=image_paths)
	assistant_texts = []

	for resp in response:
	try:
	assistant_texts.append(resp.response_text)
	except Exception:
	assistant_texts.append(str(resp))

	response_text = "\n".join(assistant_texts)
	print(response_text)
	```

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{2510.07181,
	Author = {Yi Han and Cheng Chi and Enshen Zhou and Shanyu Rong and Jingkun An and Pengwei Wang and Zhongyuan Wang and Lu Sheng and Shanghang Zhang},
	Title = {TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics},
	Year = {2025},
	Eprint = {arXiv:2510.07181},
	}
	```