mtgv
/

Lenna-7B

Text Generation

Model card Files Files and versions

Lenna-7B / README.md

mtgv's picture

Update README.md

dfea805 about 2 years ago

|

history blame contribute delete

998 Bytes

	---
	license: apache-2.0
	datasets:
	- liuhaotian/LLaVA-Instruct-150K
	- jxu124/refcoco
	- jxu124/refcocog
	- jxu124/refcocoplus
	metrics:
	- accuracy
	language:
	- en
	---
	# Model Summery
	We propose Lenna a Language enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection.
	This is achieved by incorporating an additional <DET> token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position.
	To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection.

	# Model Sources
	- Repository: https://github.com/Meituan-AutoML/Lenna
	- Paper: https://arxiv.org/abs/2312.02433

	# How to Get Started with the Model
	Model weights can be loaded with Hugging Face Transformers. Examples can be found at [Github](https://github.com/Meituan-AutoML/Lenna).