VisionReasoner-7B / README.md
nielsr's picture
nielsr HF Staff
Add metadata, project page link and improve description
90abac3 verified
|
raw
history blame
1.1 kB
---
license: apache-2.0
datasets:
- COCO
- ReasonSeg
- CountBench
language:
- en
metrics:
- accuracy
base_model:
- Qwen2.5-VL
pipeline_tag: image-text-to-text
library_name: transformers
---
# VisionReasoner-7B
[Paper](https://huggingface.co/papers/2505.12081)
Code: [https://github.com/dvlab-research/VisionReasoner](https://github.com/dvlab-research/VisionReasoner)
Project page: [https://github.com/dvlab-research/VisionReasoner](https://github.com/dvlab-research/VisionReasoner)
## Description
This is a VisionReasoner-7B model. It introduces a decoupled architecture consisting of a reasoning model and a segmentation model. The reasoning model interprets user intentions, generates explicit reasoning chains, and produces positional prompts, which are subsequently used by the segmentation model to generate pixel-level masks.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# load model
model = AutoModelForCausalLM.from_pretrained("Ricky06662/VisionReasoner-7B")
tokenizer = AutoTokenizer.from_pretrained("Ricky06662/VisionReasoner-7B")
```