HaochenWang
/

GAR-8B

Feature Extraction

Model card Files Files and versions

GAR-8B / README.md

HaochenWang's picture

Update README.md

914d9cb verified 4 months ago

|

history blame contribute delete

905 Bytes

	---
	license: fair-noncommercial-research-license
	language:
	- en
	base_model:
	- facebook/Perception-LM-8B
	library_name: transformers
	datasets:
	- HaochenWang/Grasp-Any-Region-Dataset
	---

	# GAR-8B

	This repository contains the GAR-8B model, as presented in the paper [Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs](https://huggingface.co/papers/2510.18876).

	TL; DR: Our Grasp Any Region (GAR) supports both (1) describing a single region of an image or a video in the form of points/boxes/scribbles/masks in detail and (2) understanding multiple regions such as modeling interactions and performing complex reasoning. We also release a new benchmark, GARBench, to evaluate models on advanced region-level understanding tasks.


	## Usage

	For detailed usage of this model, please refer to our [GitHub repo](https://github.com/Haochen-Wang409/Grasp-Any-Region).