Feature Extraction
Transformers
Safetensors
English
GAR
custom_code
How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("feature-extraction", model="HaochenWang/GAR-8B", trust_remote_code=True)
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("HaochenWang/GAR-8B", trust_remote_code=True, dtype="auto")
Quick Links

GAR-8B

This repository contains the GAR-8B model, as presented in the paper Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs.

TL; DR: Our Grasp Any Region (GAR) supports both (1) describing a single region of an image or a video in the form of points/boxes/scribbles/masks in detail and (2) understanding multiple regions such as modeling interactions and performing complex reasoning. We also release a new benchmark, GARBench, to evaluate models on advanced region-level understanding tasks.

Usage

For detailed usage of this model, please refer to our GitHub repo.

Downloads last month
16
Safetensors
Model size
10B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HaochenWang/GAR-8B

Finetuned
(2)
this model

Dataset used to train HaochenWang/GAR-8B

Space using HaochenWang/GAR-8B 1

Collection including HaochenWang/GAR-8B

Paper for HaochenWang/GAR-8B