Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,23 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- liuhaotian/LLaVA-Instruct-150K
|
| 5 |
+
- jxu124/refcoco
|
| 6 |
+
- jxu124/refcocog
|
| 7 |
+
- jxu124/refcocoplus
|
| 8 |
+
metrics:
|
| 9 |
+
- accuracy
|
| 10 |
+
language:
|
| 11 |
+
- en
|
| 12 |
---
|
| 13 |
+
# Model Summery
|
| 14 |
+
We propose Lenna a Language enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection.
|
| 15 |
+
This is achieved by incorporating an additional <DET> token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position.
|
| 16 |
+
To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection.
|
| 17 |
+
|
| 18 |
+
# Model Sources
|
| 19 |
+
- Repository: https://github.com/Meituan-AutoML/Lenna
|
| 20 |
+
- Paper: https://arxiv.org/abs/2312.02433
|
| 21 |
+
|
| 22 |
+
# How to Get Started with the Model
|
| 23 |
+
Model weights can be loaded with Hugging Face Transformers. Examples can be found at [Github](https://github.com/Meituan-AutoML/Lenna).
|