File size: 998 Bytes
b8cb10f
 
dfea805
 
 
 
 
 
 
 
 
b8cb10f
dfea805
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
license: apache-2.0
datasets:
- liuhaotian/LLaVA-Instruct-150K
- jxu124/refcoco
- jxu124/refcocog
- jxu124/refcocoplus
metrics:
- accuracy
language:
- en
---
# Model Summery
We propose Lenna a Language enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection. 
This is achieved by incorporating an additional <DET> token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position. 
To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection.

# Model Sources
- Repository: https://github.com/Meituan-AutoML/Lenna
- Paper: https://arxiv.org/abs/2312.02433

# How to Get Started with the Model
Model weights can be loaded with Hugging Face Transformers. Examples can be found at [Github](https://github.com/Meituan-AutoML/Lenna).