Experience on finetuning for specific language
Hi, i've tested the model performed well for almost of my cases. However, there still small issue with tone marks. My language is Vietnamese - is based on the Latin alphabet. Model sometime will confuse example "Phường" -> "Phòng", "Tỉnh" -> "Tinh" so i want to enhance performance on Vietnamese but also prevent catastrophic forgetting. I've checked the finetuning notebook and i think i should freeze the vision weights and unfreeze language weights for my case maybe i should LoRA instead of full fine-tuning to adapt correct tone mark? And how many sample will be a good starting point? I should collect pdf files with diverse structure to maintain the generic of model?
Hi,
Yes, i think finetuning makes a lot of sense in this case!
Hi, i've tested the model performed well for almost of my cases. However, there still small issue with tone marks. My language is Vietnamese - is based on the Latin alphabet. Model sometime will confuse example "Phường" -> "Phòng", "Tỉnh" -> "Tinh" so i want to enhance performance on Vietnamese but also prevent catastrophic forgetting. I've checked the finetuning notebook and i think i should freeze the vision weights and unfreeze language weights for my case maybe i should LoRA instead of full fine-tuning to adapt correct tone mark? And how many sample will be a good starting point? I should collect pdf files with diverse structure to maintain the generic of model?
Yes! Freeze vision weights and train only on language weights with LoRA, works like magic! Tested.