Update README.md
Browse files
README.md
CHANGED
|
@@ -5,4 +5,16 @@ language:
|
|
| 5 |
base_model:
|
| 6 |
- facebook/Perception-LM-1B
|
| 7 |
library_name: transformers
|
| 8 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
base_model:
|
| 6 |
- facebook/Perception-LM-1B
|
| 7 |
library_name: transformers
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# GAR-1B
|
| 11 |
+
|
| 12 |
+
This repository contains the **GAR-1B** model, as presented in the paper [Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs](https://github.com/Haochen-Wang409/Grasp-Any-Region).
|
| 13 |
+
|
| 14 |
+
**TL; DR:** Our Grasp Any Region (GAR) supports both (1) describing a single region of an image or a video in the form of points/boxes/scribbles/masks in detail and (2) understanding multiple regions such as modeling interactions and performing complex reasoning. We also release a new benchmark, GARBench, to evaluate models on advanced region-level understanding tasks.
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
## Usage
|
| 18 |
+
|
| 19 |
+
For detailed usage of this model, please refer to our [GitHub repo](https://github.com/Haochen-Wang409/Grasp-Any-Region).
|
| 20 |
+
|