Feature Extraction
Transformers
Safetensors
English
GAR
custom_code
GAR-8B / README.md
HaochenWang's picture
Update README.md
914d9cb verified
metadata
license: fair-noncommercial-research-license
language:
  - en
base_model:
  - facebook/Perception-LM-8B
library_name: transformers
datasets:
  - HaochenWang/Grasp-Any-Region-Dataset

GAR-8B

This repository contains the GAR-8B model, as presented in the paper Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs.

TL; DR: Our Grasp Any Region (GAR) supports both (1) describing a single region of an image or a video in the form of points/boxes/scribbles/masks in detail and (2) understanding multiple regions such as modeling interactions and performing complex reasoning. We also release a new benchmark, GARBench, to evaluate models on advanced region-level understanding tasks.

Usage

For detailed usage of this model, please refer to our GitHub repo.