Siddharth Nahar

Changed .safetensors from Ravi Naik llava-phi2 to marianna llava-phi2

2d28745 almost 2 years ago

1.25 kB

	---
	license: mit
	datasets:
	- liuhaotian/LLaVA-Instruct-150K
	- liuhaotian/LLaVA-Pretrain
	language:
	- en
	pipeline_tag: visual-question-answering
	---

	# Model Card for Model ID

	This is a multimodal implementation of [Phi2](https://huggingface.co/microsoft/phi-2) model inspired by [LlaVA-Phi](https://github.com/zhuyiche/llava-phi).

	## Model Details
	1. LLM Backbone: [Phi2](https://huggingface.co/microsoft/phi-2)
	2. Vision Tower: [clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336)
	4. Pretraining Dataset: [LAION-CC-SBU dataset with BLIP captions(200k samples)](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
	5. Finetuning Dataset: [Instruct 150k dataset based on COCO](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)
	6. Finetuned Model: [marianna13/llava-phi-2-3b](https://huggingface.co/marianna13/llava-phi-2-3b)


	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Original Repository: [Llava-Phi](https://github.com/zhuyiche/llava-phi)
	- Paper [optional]: [LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model](https://arxiv.org/pdf/2401.02330)
	- Demo [optional]: [Demo Link](https://huggingface.co/spaces/RaviNaik/MultiModal-Phi2)