mtgv
/

Lenna-7B

 ---
 license: apache-2.0
+datasets:
+- liuhaotian/LLaVA-Instruct-150K
+- jxu124/refcoco
+- jxu124/refcocog
+- jxu124/refcocoplus
+metrics:
+- accuracy
+language:
+- en
 ---
+# Model Summery
+We propose Lenna a Language enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection.
+This is achieved by incorporating an additional <DET> token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position.
+To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection.
+# Model Sources
+- Repository: https://github.com/Meituan-AutoML/Lenna
+- Paper: https://arxiv.org/abs/2312.02433
+# How to Get Started with the Model
+Model weights can be loaded with Hugging Face Transformers. Examples can be found at [Github](https://github.com/Meituan-AutoML/Lenna).