HaochenWang
/

GAR-1B

Feature Extraction

Model card Files Files and versions

HaochenWang commited on Oct 21, 2025

Commit

3aea4ad

·

verified ·

1 Parent(s): c6ab483

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -5,4 +5,16 @@ language:
 base_model:
 - facebook/Perception-LM-1B
 library_name: transformers
----

 base_model:
 - facebook/Perception-LM-1B
 library_name: transformers
+---
+# GAR-1B
+This repository contains the **GAR-1B** model, as presented in the paper [Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs](https://github.com/Haochen-Wang409/Grasp-Any-Region).
+**TL; DR:** Our Grasp Any Region (GAR) supports both (1) describing a single region of an image or a video in the form of points/boxes/scribbles/masks in detail and (2) understanding multiple regions such as modeling interactions and performing complex reasoning. We also release a new benchmark, GARBench, to evaluate models on advanced region-level understanding tasks.
+## Usage
+For detailed usage of this model, please refer to our [GitHub repo](https://github.com/Haochen-Wang409/Grasp-Any-Region).