File size: 2,280 Bytes
d17c49b 67412da 27320e1 12a90e2 27320e1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
---
license: apache-2.0
language:
- en
tags:
- computer-vision
- robotics
- spatial-reasoning
- vision-language-model
- multi-modal
- glm4v
- fine-tuned
base_model: glm4v
model_type: vision-language-model
datasets:
- custom
library_name: transformers
---
# TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
## Usage
### Environment Requirements
```bash
pip install -r requirements.txt
```
### Configuration
Before using the model, you need to update the configuration file `glm4v_tisr_full_inference.yaml`:
1. Update `media_dir` to your image directory:
```yaml
media_dir: /path/to/your/images
```
2. Update the image path in `example_usage.py`:
```python
image_paths = ["/path/to/your/image.jpg"] # Replace with actual image path
```
### Basic Usage
```python
import sys
from llamafactory.chat.chat_model import ChatModel
# Load model using LLaMA-Factory ChatModel
config_file = "glm4v_tisr_full_inference.yaml"
# Simulate command line arguments
original_argv = sys.argv.copy()
sys.argv = [sys.argv[0], config_file]
try:
chat_model = ChatModel()
finally:
# Restore original command line arguments
sys.argv = original_argv
# Prepare input
image_paths = ["/path/to/your/image.jpg"] # Replace with actual image path
question = "Two points are circled on the image, labeled by A and B beside each circle. Which point is closer to the camera? Select from the following choices.\n(A) A is closer\n(B) B is closer"
# Prepare messages
messages = [
{
"role": "user",
"content": question
}
]
# Get model response
response = chat_model.chat(messages, images=image_paths)
assistant_texts = []
for resp in response:
try:
assistant_texts.append(resp.response_text)
except Exception:
assistant_texts.append(str(resp))
response_text = "\n".join(assistant_texts)
print(response_text)
```
## Citation
If you use this model, please cite:
```bibtex
@misc{2510.07181,
Author = {Yi Han and Cheng Chi and Enshen Zhou and Shanyu Rong and Jingkun An and Pengwei Wang and Zhongyuan Wang and Lu Sheng and Shanghang Zhang},
Title = {TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics},
Year = {2025},
Eprint = {arXiv:2510.07181},
}
``` |