| This model is fine-tuned based on the combination of the 100k Adjusted Caption & Paragraph dataset and the Adjusted Total QA dataset. | |
| During fine-tuning, the updated modules include Vision Encoder, Project Layer, and LM LoRA 128. | |
| Vision Encoder: | |
| Directly updated | |
| Project Layer: | |
| Directly updated | |
| LM LoRA 128: | |
| lora_target_modules: | |
| 'attention.wqkv' | |
| 'attention.wo' | |
| 'feed_forward.w1' | |
| 'feed_forward.w2' | |
| 'feed_forward.w3' |