|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- liuhaotian/LLaVA-Instruct-150K |
|
|
- jxu124/refcoco |
|
|
- jxu124/refcocog |
|
|
- jxu124/refcocoplus |
|
|
metrics: |
|
|
- accuracy |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
# Model Summery |
|
|
We propose Lenna a Language enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection. |
|
|
This is achieved by incorporating an additional <DET> token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position. |
|
|
To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection. |
|
|
|
|
|
# Model Sources |
|
|
- Repository: https://github.com/Meituan-AutoML/Lenna |
|
|
- Paper: https://arxiv.org/abs/2312.02433 |
|
|
|
|
|
# How to Get Started with the Model |
|
|
Model weights can be loaded with Hugging Face Transformers. Examples can be found at [Github](https://github.com/Meituan-AutoML/Lenna). |