Feature Extraction
Transformers
Safetensors
English
GAR
custom_code
HaochenWang commited on
Commit
3aea4ad
·
verified ·
1 Parent(s): c6ab483

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -5,4 +5,16 @@ language:
5
  base_model:
6
  - facebook/Perception-LM-1B
7
  library_name: transformers
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  base_model:
6
  - facebook/Perception-LM-1B
7
  library_name: transformers
8
+ ---
9
+
10
+ # GAR-1B
11
+
12
+ This repository contains the **GAR-1B** model, as presented in the paper [Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs](https://github.com/Haochen-Wang409/Grasp-Any-Region).
13
+
14
+ **TL; DR:** Our Grasp Any Region (GAR) supports both (1) describing a single region of an image or a video in the form of points/boxes/scribbles/masks in detail and (2) understanding multiple regions such as modeling interactions and performing complex reasoning. We also release a new benchmark, GARBench, to evaluate models on advanced region-level understanding tasks.
15
+
16
+
17
+ ## Usage
18
+
19
+ For detailed usage of this model, please refer to our [GitHub repo](https://github.com/Haochen-Wang409/Grasp-Any-Region).
20
+