Capstone_Project / README.md
Navyabhat's picture
Upload 13 files
d43c6a1 verified
|
raw
history blame
1.44 kB
metadata
title: MultiModal Phi2
emoji: 🚀
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.35.2
app_file: app.py
pinned: false
license: mit

Phi2 : Multimodal Finetuning

Details

  1. LLM Backbone: Phi2
  2. Vision Tower: clip-vit-large-patch14-336
  3. Audio Model: Whisper
  4. Pretraining Dataset: LAION-CC-SBU dataset with BLIP captions(200k samples)
  5. Finetuning Dataset: Instruct 150k dataset based on COCO

Design

image

Pretraining

Training Loss Curve

image

Learing Rate

image

Training Logs

image

Finetuning

Training Loss Curve

image

Learing Rate

image

Training Logs

image

Results

image