ChengXinSong's picture
Create README.md
3a236fa verified

This model is fine-tuned based on the combination of the 100k Adjusted Caption & Paragraph dataset and the Adjusted Total QA dataset.

During fine-tuning, the updated modules include Vision Encoder, Project Layer, and LM LoRA 128.

Vision Encoder:

Directly updated

Project Layer:

Directly updated

LM LoRA 128:

lora_target_modules: 'attention.wqkv' 'attention.wo' 'feed_forward.w1' 'feed_forward.w2' 'feed_forward.w3'