This model is fine-tuned based on the combination of the 100k Adjusted Caption & Paragraph dataset and the Adjusted Total QA dataset.
During fine-tuning, the updated modules include Vision Encoder, Project Layer, and LM LoRA 128.
Vision Encoder:
Directly updated
Project Layer:
Directly updated
LM LoRA 128:
lora_target_modules: 'attention.wqkv' 'attention.wo' 'feed_forward.w1' 'feed_forward.w2' 'feed_forward.w3'